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Preface 


Some years ago we noticed that various seemingly disparate fields were 
using similar models and techniques to solve similar problems. From math- 
ematics to computer science to engineering to biology, various forms of dif- 
ference equations were appearing in research papers and in textbooks, but 
with no common background, the same results were being independently 
derived over and over again. As we were noticing this, some mathematics 
curricula were being revised with discrete mathematics replacing calculus 
as the first college mathematics course. New discrete mathematics courses 
were created, and several superb textbooks appeared. In some fields a year 
of discrete mathematics actually replaced a year of calculus, while in other 
fields students took both discrete mathematics and calculus. 

With these changes, what happened to difference equations? Some texts 
in discrete mathematics ignored them. Others had a few examples of dif- 
ference equations as applications of proof by induction. Still others de- 
voted a chapter to difference equations, but only solved a few special cases 
and/or represented generating functions (also called Z-transforms) as the 
principal or only method for finding solutions. With this lack of common 
background, texts on algorithms, signal processing, and population biology 
were still forced to devote chapters to the difference equations used in their 
areas. Even students who took several of these courses had difficulty seeing 
that they were working with the same difference equations in different con- 
texts. Many instructors had written notes to flesh out the coverage given in 
texts, but such notes were of necessity usually so terse that students were 
led to believe that difference equations were very complicated and hard to 
understand. 
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With these problems in mind, we set out to write a book on difference 
equations that is accessible to undergraduates. As a text, it is meant for 
undergraduate majors in one of the mathematical sciences, presumably in 
their junior or senior year. We’ve written it for the student who likes to 
compute and is comfortable with mathematical proof, but the book can 
be profitably read by students who approach the subject from either a 
computational or theoretical point of view. 

We wanted our text to have an algorithmic spirit. In this book, each 
chapter leads to techniques that can be applied by hand to small examples 
and also can be programmed for larger examples. In many cases we give 
explicit algorithms, which we decided to write in pseudocode rather than 
in a specific programming language for several reasons. First, it is easy 
to translate from pseudocode into any reasonable programming language. 
Second, there are many programming languages available, and translating 
from one language to another is often more difficult than translating from 
pseudocode. Third, we are not sure that programming these algorithms is 
worth the effort, because for almost all of our examples there are high- 
quality implementations readily available on the Web. It probably makes 
more sense to use one of these programs rather than to cobble together a 
program that will be used only a few times and/or will be prone to problems 
when the input is not exactly in the form assumed by the programmer. A 
number of mathematically oriented computer packages are also available. 
For example, MATLAB, Maple, and Mathematica all have packages that 
will solve difference equations and recurrence relations. In many cases these 
packages give numeric answers as well as symbolic solutions when possible. 
Using these packages is much simpler than programming from scratch. 

In this book we start with the old story of Fibonacci’s rabbits and 
progress through several generalizations, ending with some nonlinear differ- 
ence equations. We deal with familiar mathematical structures such as the 
real numbers, the complex numbers, the integers, and the integers modulo 
an integer. We were tempted to discuss more general structures in order to 
show, for example, how theories of computation could be represented as dif- 
ference equations, but we soon discovered that this would result in either a 
very large book or a very formal book, which would be at variance with our 
goal of accessibility. After developing the theory and techniques for solving 
linear difference equations in Chapters 2 to 4, we specialize to equations 
with nonnegative coefficients in Chapters 5 and 6 and then consider the 
generalization to matrix difference equations in Chapter 7. Chapter 8 con- 
siders equations over other rings, including integers modulo m and finite 
fields. Chapter 9 considers some issues in computational complexity, includ- 
ing divide-and-conquer algorithms. We end with some nonlinear systems in 
Chapter 10. Along the way we use linear algebra, develop formal power 
series, solve some combinatorial problems, visit Perron—Frobinus theory, 
use graph theory, discuss pseudorandom number generation and integer 
factorization, and use the FFT to multiply polynomials quickly. 
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There are four appendices serving different purposes. The first is a collec- 
tion of worked examples, which are meant to supplement the early chapters 
of the book. Because the material in Appendices B and C is essential to 
an understanding of the book, we suggest working through them before 
beginning Chapter 2. Although many of the difference equations we con- 
sider have integer or real coefficients, it is often necessary to consider the 
coefficients as complex numbers. Appendix B gives the highlights of the 
complex analysis we use, and no prior experience is necessary to under- 
stand this appendix. On the other hand, only the most exceptional student 
could learn new material at the rate at which linear algebra is presented in 
Appendix C. One of the aims of this book is to show students that linear 
algebra is a powerful and coherent subject whose ideas have diverse appli- 
cations, and we hope Appendix C is a helpful review. Appendix D outlines 
a method of Morris Marden [105] that can be used to decide when the 
general solution of a difference equation converges to zero. This appendix 
is not needed for an understanding of the book. 
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Most of our examples work with “small” difference equations, equations 
that can be completely solved by hand. In particular, for these equations 
their characteristic polynomials can be found, the roots of these polyno- 
mials can be computed exactly, and the associated eigenvector equations 
can be solved. While the theory we develop applies to both small and 
large equations, these computations may be difficult or impossible for large 
equations. For example, actually factoring polynomials is not possible in 
general, and rational computation of characteristic polynomials may re- 
quire numbers with very many digits. Numerical approximation methods 
are often used for these computations, and we refer the interested reader 
to Acton [1], who gives a good introduction to numerical methods. (More 
serious users might refer to the compendium [131] or to the classic [170] by 
Wilkinson.) In general, we do not cover numerical methods. The one excep- 
tion to this rule is our discussion of the use of Newton’s method for finding 
the positive root of a nonnegative polynomial. We include this method for 
several reasons: it rapidly finds this root, the proof of its convergence and 
its speed of convergence are relatively easy, and the method is an example 
of a commonly encountered nonlinear difference equation. 
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Notational Preliminaries 


In this book, we use the following fairly standard notation: 


Z is the set of integers. 

Zm, is the set of integers modulo m. 

ZF is the set of all k-tuples with integer coordinates. 

ZX is the set of all k-tuples of integers modulo m. 

N is the set of natural numbers, including 0; N = {0,1,2,...}. 
Nt is the set of positive integers. 

Q is the set of rational numbers. 

R is the set of real numbers. 

F denotes a finite field. 


C is the set of complex numbers. 


R[a] is the set of polynomials with real coefficients. 

C{a] is the set of polynomials with complex coefficients. 
Zm| 

|x| is the floor of x € R, the largest integer n with n < a. 


x] is the set of polynomials whose coefficients are integers modulo m. 


is the set of polynomials with coefficients from the finite field F. 


[x] is the ceiling of « € R, the smallest integer n with n> x. 
k (mod m) means the equivalence class {k + jm : 7 € Z}, while 


k mod m means the least nonnegative integer in the class k (mod m). 
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Fibonacci Numbers 


This chapter is devoted to the Fibonacci numbers. We start with the famil- 
iar definition, move on to some more sophisticated points of view, and then 
formulate some questions that are typical of those that can be addressed 
using the material of this book. 


1.1 The Rabbit Problem 


In the year 1202 the Italian mathematician Leonardo Pisano (which means 
Leonardo of Pisa) published Liber Abaci,! a book of problems whose pur- 
pose was to illustrate the usefulness of Arabic numerals in arithmetic com- 
putations because at that time cumbersome Roman numerals were still 
being used in Italy. One of the problems discussed in Pisano’s book consid- 
ers pairs of breeding rabbits. Each pair of rabbits matures in two months 
and produces one new pair each month thereafter, beginning with the last 
day of its second month. Starting with a single infant pair born at the be- 
ginning of Month 0, how many pairs will there be one year after this pair 
begins breeding? We can find our way to a solution by considering what 
happens in the first few months: 


1. At the end of Month 0 there is only one pair, and they are not yet 
breeding. 


1The book was reprinted in 1857-1862 by Baldassarre Boncompagni [11]. A transla- 
tion by L. Sigler [147] has recently been published by Springer-Verlag. 
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2. At the end of Month 1 the first pair gives birth to a second pair. 


3. At the end of Month 2 the second pair is one month old and not yet 
breeding. The first pair produces another pair, giving three pairs at 
the end of Month 2. 


4. By the end of Month 3, the first pair has produced yet another pair, 
and the second pair has given birth to its first progeny. Therefore, 
there are five pairs at the end of Month 3. 


5. For Month 4, the two new pairs born the previous month are not 
bearing young, but all three older pairs give birth to one new pair. 
Eight pairs are alive at the end of Month 4. 


We observe that at the end of the n*® month each pair that was alive 
at the (n — 2)"¢ month has given birth to a new pair during the month. 
Therefore, the total number of pairs alive at the end of the nt* month is 
the number from the (n — 1) month plus one new pair for every pair that 
was alive at the end of the (n — 2)°¢ month. In other words, the number of 
rabbit pairs at the end of n months is the sum of the numbers at the end 
of the two previous months. 

The numbers generated by this most famous example of a recurrence re- 
lation (which we often simply call a “recurrence” ) are called the Fibonacci 
numbers, because Pisano is usually referred to as Fibonacci, which means 
son of Bonaccio. These Fibonacci numbers are given by the sequence 


oe 


whose first two terms are 1,2 and each subsequent term is the sum of the 
preceding two terms. The Rabbit Problem asks for the thirteenth element 
of this sequence, which is the number 377. If we are asked for the number of 
rabbits after one thousand months, could we compute the 1001** Fibonacci 
number without computing all 1000 numbers that came before? Exam- 
ination of some slightly more sophisticated ways to view the Fibonacci 
numbers will provide an answer to this question. 


1.2 The Fibonacci Sequence 


In calculus we often think of a function such as f(z) = x? as a simpler 
object than an infinite sequence, but a sequence 50, 51, 52,... of complex 


numbers is really just a function 
s:N—-C, 


where s,, is the value of the function s at n. In this book we will denote 
elements of sequences by using subscripts and use (s,) as shorthand for 
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the entire sequence. We will also define a sequence by giving its “generic 
term,” so (n”) is the sequence (s,) with s, =n? for all n EN. 

We will slightly modify the Fibonacci sequence, and from now on define 
the first two Fibonacci numbers to be 0 and 1, while each subsequent 
Fibonacci number is still the sum of its two immediate predecessors. Such 
a rule, which defines the elements of a sequence by a formula involving 
some fixed number (in this case two) of preceding elements, is called a 
recurrence relation. The Fibonacci sequence is defined by the linear 
recurrence 


(it) fo = 9, fr =1, 
1.2 fn42 = fntoit fn- 


The recurrence itself is (1.2), and the two values given in (1.1) are called 
the initial conditions. The system (1.2) and (1.1) is referred to as an 
initial value problem. The equation in (1.2) is called linear because the 
terms of the sequence are connected in a linear manner, using only linear 
combinations of previous elements of the sequence. (In general, scaling by 
real or complex constants is allowed.) 

As we said above, it’s often helpful to think of a sequence as a function 
f defined on the set of natural numbers N. The Fibonacci sequence is the 
function f(n) = f, with 


(1.3) f(0)=0, f@)=1 
(1.4) f(nt+ 2) = f(nt+1) + f(n). 


Here we begin the sequence at the position f(0) because verifying and 
developing formulas is often easier when we initialize with f(0) rather than 
with f(1). 

When we define the Fibonacci sequence it’s customary to think of the 
index n in (1.2) and (1.4) as a natural number, but there’s no purely math- 
ematical reason to limit ourselves to nonnegative n. From any two con- 
secutive Fibonacci numbers the recurrence (1.2) allows us to compute the 
preceding element of the Fibonacci sequence using fn = fn4e—fn+1. There- 
fore, we can proceed backwards as well as forwards, and any integer can 
be used as an index. Some “center” terms of the associated doubly infinite 
Fibonacci sequence are 


A as ee ee es A a 


(Exercise 1.6 asks you to verify this pattern.) Extending recurrences to in- 
clude negative indices can have some advantages, sometimes giving a deeper 
insight into the theory. It’s especially helpful for modeling population dy- 
namics and other time-dependent phenomena. 
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1.2.1 Computing Fibonacci numbers 


The definition of the Fibonacci function given in (1.4) is called a recursive 
definition. It allows us to write a short recursive computer program to 
compute (at least in theory) the value of the Fibonacci function at any 
given nonnegative n. 


PROCEDURE RECRFIB(n,f) (* Input n, Output f(n) *) 
IF n<l 
THEN f:=n 
ELSE RECRFIB(n—1,q) 


RECRFIB(n — 2,h) 
f:=gth 


Another style of program for computing f(n) for nonnegative n is given by 


PROCEDURE FIB(n,f) (* Input n, Output f(n) *) 
IF n<1 THEN f:= 


The first procedure, RECRFIB, computes values of the Fibonacci function 
by making calls to itself. Such a program is usually called recursive. Pro- 
gram FIB does not call on itself, but computes by repeatedly executing the 
statements in the FOR loop. Such a program is usually called iterative. 
Once we know some program that computes a given function, it is natural 
to ask for a program that computes the function rapidly. Finding such 
a program is not as easy as it sounds. For instance, for the Fibonacci 
function, we might want to know how quickly we can produce a single term 
f(n) in terms of the size of the input n, or we might want to know how 
quickly we can compute all of the first n terms of the Fibonacci sequence in 
terms of the input size. Or we might want to answer these same questions 
in the context of constraints on the size of the memory of the computer 
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used for the computation. These may be realistic or simply theoretical 
constraints. More effective procedures for computing Fibonacci numbers 
are given in [40]. 


1.2.2 A formula for the Fibonacci numbers 


Although the defining recurrence (1.2) theoretically allows us to compute 
f(n) for any n, it’s not an especially tidy mathematical expression for 
f(n), since all previous terms are needed in order to calculate just one 
term of the Fibonacci sequence. For example, think about the question of 
quickly estimating the number of rabbit pairs after one thousand months 
of rabbit breeding. If the function f had a nice formula, it would be easier 
to answer such a question. Fortunately, and perhaps surprisingly, there is 
such a formula for values of the Fibonacci function: 


(8) -(Cs") | 


5 
; it’s often referred to as the 


a 
vd 


(1.5) f(r) = 


You might recognize the number 


golden mean or golden section. This formula (1.5) is known as Bi- 
net’s Formula and can be derived using the techniques given in the next 
chapter (refer to Exercise 2.10). Although it is usually attributed to Jacques 
Phillipe Marie Binet (1786-1856), Donald Knuth [88, vol. 1, p. 82] says that 
Abraham de Moivre reported the formula in 1730 [48, pp. 26-42] when he 
considered the general linear recurrence. 

For now, we can appreciate the power of Binet’s Formula by using it to 
estimate size of the 1000 Fibonacci number. Since 


1 i 
ae 81k aud “ 


x —.618. 
and (.618)1°° is very small, then 


ejsies 1000 
f(1000) = (1.618) 


If we’re interested in estimating the size of f(1000) in terms of the number 


of its decimal digits, we can use the base-10 logarithm. (Refer to Exer- 
cise 1.9.) Since 


1000 - log(1.618) — .5 log(5) = 208.629, 
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we expect the number of decimal digits in (1000) to be about 209. In fact, 
the 1000“ Fibonacci number is the number 


f(1000) = 43466557686937456435688527675040625802564660517371 
78040248 172908953655541794905 189040387984007925516 
92959225930803226347752096896232398733224711616429 
96440906533187938298969649928516003704476137795166 
849228875, 


which does have the predicted 209 digits. 
With Ao = 15, Ay = Lys, and a = 1/5, Binet’s Formula can be 
rewritten as 
f(n) = a(rg — At) - 
Since |Ai| < 1 and a < 1/2, then | fp — adg| = alAi|" < 1/2, and the fact 
that f, is an integer therefore gives the pleasant identity 


(1.6) i Round(j}/V5) for alln > 0, 


where Round(X) returns the integer nearest to X.? We analyze this notion 
of roundability in Chapter 5. (If you want to look ahead, the main result 
is Theorem 5.2.2). 


1.2.3 Further Fibonacci facts 


The beauty and arcana of Fibonacci numbers are studied in a number of 
places. We would be remiss if we did not mention the Fibonacci Quarterly, 
a journal that has been publishing for about half a century. One of the 
mainstays of this journal was Brother Alfred. His book [16] is a nice in- 
troduction to Fibonacci numbers. From the Russian literature, Vorobev’s 
book [164] is a concise introduction to Fibonacci facts and formulas. A 
more extensive recent book is Fibonacci and Lucas Numbers [90]. For some 
of the minor arcana, see our paper [22]. 


1.3 Notation for Asymptotic Analysis 


There are often many algorithms for the same problem. For instance, in 
the case of computing Fibonacci numbers we’ve already written two types 
of algorithms for computing a specific Fibonacci number, either using the 
definition directly or using Binet’s Formula. Our usual method of assess- 
ing the efficiency of various algorithms for a problem will be to compare 


2Note that Round(4) is not defined when k is an odd integer. 
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their run times, and usually this is done with an asymptotic analysis. For 
this there are three standard forms of notation, Big-Oh, Big-Omega, 
and Big-Theta. Each notation removes unimportant details so we can see 
the size of run time more clearly. This notation was codified for computer 
scientists by Donald Knuth [85, 86] in 1976, when he drew upon notation 
already used in analytic number theory. Big-Oh? was introduced by Bach- 
mann in 1894, and something very similar to Big-Omega was used by Hardy 
and Littlewood in 1914. More information on the history of this notation 
(and that of Big-Theta from the 1960s) can be found in Knuth’s article. 
Although our principal use of this notation is for positive real numbers, our 
definitions allow T(n) to be a complex-valued sequence. 


Let T(n) and f(n) be two complex-valued sequences. Then 


T(n) = O(f(n)) (we say T(n) has order at most f(n)) means that there 
exists a positive constant c such that 


|T(n)| <clf(m)| for all sufficiently large n. 


T(n) = Q(f(n)) (we say T(n) has order at least f(n)) means that there 
exists a positive constant c such that 


|T(n)| > clf(m)| for all sufficiently large n. 


T(n) = O(f(n)) (we say T(n) has order exactly f(n)) means that there 
exist positive constants c,,c2 such that 


cilf(n)| < |T(n)| < co|f(m)| — for all sufficiently large n. 


For instance, from (1.6) we know that the asymptotic size of f, is O(Aj) 
where \p = (1+ V5) /2. To get some practice with this notation, you should 
verify that for any fixed pair of positive integers i < 7, each of the following 
holds: 


n’=O(n?), n? =Q(n*), @(n') = O(n’) implies i = 7. 
1.4 Exercises 


Ex 1.1. (Taken from Liber Abaci, pp. 283 ff.) Translate the following: 
‘*Quot paria coniculorum in uno anno ex uno pario germinentur.’’ 


Qvidam posuit unum par cuniculorum in quodam loco, qui erat 


3The term Big-Omicron is used by Knuth, but most other authors call this Big-Oh 
notation. 
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undique pariete circundatus, ut sciret, quot ex eo paria 
germinarentur in uno anno: cum natura eorum sit per singulum 
mensem aliud par germinare; et in secundo mense ab eorum 
natiuitate germinant. Quia suprascriptum par in primo mense 
germinat, duplicabis ipsum, erunt paria duo in uno mense. 

Ex quibus unum, scilicet primum, in secundo mense germinat; et 
sic sunt in secundo mense paria 3; ex quibus in uno mense duo 
pregnantur; et germinatur in tercio mense paria 2 conciculorum; 
et sic sunt paria 5 in ipso mense;... Cum quibus etiam additis 
parijs 144, que germinatur in ultimo mense, erunt paria 377; 
et tot paria peperit suprascriptum par in prefato loco in 
capite unius anni. 


Ex 1.2. Let « € R. Let |x| be the floor of z, that is, the largest integer 
n such that n < a. Let [2] be the ceiling of x, that is, the least integer n 
such that x < n. Show that |#| = [a] iff x € Z. 


Ex 1.3. Suppose (s,,) is any sequence that satisfies the Fibonacci recur- 
rence (1.2) but possibly has different initial values. Show that for any j, 
any term s, can be written as a linear combination of s;,5;_1 with integer 
coefficients. 


Ex 1.4. Suppose (s,,) is any sequence that satisfies the Fibonacci recur- 
rence (1.2) but possibly has different initial values. Let n1,n2 € N be any 
fixed indices. Show that any term s,, can be written as a linear combination 
of Sn,,8n. with rational coefficients. 


Ex 1.5. For the Fibonacci sequence, let n1,n2 € N be any fixed indices 
with gcd(n1,n2) = 1. Show that any term f,, can be written as a linear 
combination of fn, fn. with integer coefficients. 


Ex 1.6. Use (1.2) to show that for the Fibonacci function we have 
f(—n) = (-1)"*'f(n) for all n > 0. 


Ex 1.7. Verify Binet’s Formula (1.5) for f(2), f(3), f(4). 


Ex 1.8. Use mathematical induction to prove that every element of the 
Fibonacci sequence satisfies Binet’s Formula. 


Ex 1.9. Show that the base-10 logarithm of each of the integers 1,2,...,9 
lies in the interval [0,1), and that the logarithm of any two-digit integer 
lies in the interval [1,2). In general, show the number of decimal digits in 
an integer n is the ceiling of its logarithm to the base 10. 


Ex 1.10. If the base-10 logarithm of an integer begins with 3.53, how many 
digits does the integer have and what’s its first digit? 
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Ex 1.11. This exercise deals with the sequence of Lucas Numbers. + 
Consider the sequence (L,,) generated by 


i ae ee 
Ln+2 = Ln41 + Ln : 


(a) Calculate La, Ls, Lg. 
(b) Find constants A, B € R such that for each of n = 0,1 it is true that 


(1.7) = 4( +5) -2 (54) 


(c) Use mathematical induction to prove that your formula in the previ- 
ous part holds for all n > 0. 


Ex 1.12. (a) Write a short program that computes the Fibonacci and 
the Lucas sequences. 
(b) Use your program to calculate f39 and L39. Check your answer using 


the closed forms in (1.5) and (1.7). Compute the ratio ee 
30 


(c) The ratio Jn has a limiting value. Use your program to calculate this 


value to ten decimal places. 
Ex 1.13. For this exercise, consider the sequence (s,,) defined by 


So = 0, s, = 1, 


Sn42 = 28n41 + 28n.- 


(a) Calculate the first five terms of the sequence (s,,). 
(b) Check that the following formula correctly calculates s2, 53, $4: 


—_ ar [ca + v3)" - a — vay"). 


(c) Show that the general n‘* term of the sequence (s,,) satisfies the 
formula in the previous part. 


Ex 1.14. Show that the variable t in the procedure FIB can be eliminated 
by using f= f+j,j:=f—. 


4The French mathematician Edouard Lucas (1842-1891) studied the properties of 
this and other sequences in the first volume (page 186) of the American Journal of 
Mathematics, which appeared in 1878. 
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Ex 1.15. Show that every natural number n can be expressed in a “bi- 
nary” form n = 3 b;f; , where each 6; is either 0 or 1 and K depends on 
the natural number n. Show further that this binary Fibonacci representa- 
tion is NOT unique. If you impose the additional stipulation that no two 
consecutive b; are both 1, show that the binary Fibonacci representation is 
unique. 


Ex 1.16. Define logr;,(n) to equal the least index i for which f; > n. 
(a) Calculate some values of log; p(n). 
(b) For each i > 1, define #(i) to be the number of natural numbers n 
for which logr;p(n) = 7. Show that #(7) = f; for all 7 > 3. 


Ex 1.17. Show that any solution to 


Sn = Sn—-1 + L/n| 


with nonnegative initial conditions satisfies s, = O(n°/?). 


2 


Homogeneous Linear Recurrence 
Relations 


The simplest type of recurrence relation is the homogeneous linear recur- 
rence with constant coefficients, one in which s,4, is given as a linear 


function of sp,...,5n4+—1- In other words, for all n > 0, 
(HL) Sn+k = C1Sntk—1 + C28nph—-2 + +++ + ChSn, 
where ¢1,...,¢, are complex constants and cz, 4 0. This is called a k*” order 


homogeneous linear recurrence with constant coefficients. The purpose of 
this chapter is to analyze these recurrences using tools from linear algebra. 
Appendix C contains a review of the basic linear algebra that we will assume 
here. Even if we were only interested in integer recurrences, we would 
still need to consider recurrences whose coefficients are complex numbers 
because this is more or less forced on us by the algebra. In addition, more 
compact formulas can often be obtained by using more general number 
systems. 

Equation (HL) with given values s0,...,5,—1 is called an initial value 
problem. For any set of k initial values, the recurrence (HL) can be succes- 
sively applied to compute the infinite sequence (s,,) that satisfies this recur- 
rence. From sg,...,S,—1 the s, term is specified by sz = cy S4_-1+---+cxS0, 
the sz41 term by $441 = C18, +-+++c%51, and so on. What this informally 
shows is that every initial value problem has a solution, and the solution is 
unique. The first k terms determine the rest of the sequence. In Exercise 2.1 
we ask you to supply a formal verification of this fact, which is fundamental 
to the remainder of this chapter. 
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Arithmetic Operations on Sequences 
We can add and scale sequences in exactly the same way as functions. 
This means that (8,) + (tn) = (Sn + tn). Viewed as functions, this is 


the rule (s + t)(n) = s(n) + t(n). A sequence is multiplied by the scalar 
A € C using the rule \(s,) = (As,). The set of all functions from N into 
C (which is the same as the set of all complex sequences) is a complex 
vector space under these operations of addition and scalar multiplication. 


2.1 The Solution Space of (HL) 


We begin by analyzing the second-order initial value problem 
(2.1) 8n42 = 38n41—25n, So =2, $1 =3, 
which generates the sequence 

2,3,5,9,17, 33, 65, 129, 257,513,..., 


and has the closed formula s, = 2" + 1. We might ask what the sequence 
looks like under other initial conditions. Here are two such sequences: 


2,5, 11, 23, 47, 95, 191, 383, 767, 1535,..., 
51 <3,-11,97, 50, —193,~251, 507, 1019.42. 


It’s not quite as easy to guess a formula for the n*® term of the sequence 
starting with 2,5, but you might be lucky and notice that 


23 = 24-1, 
AT = 48-1, 
95 = 96-1, 


and so arrive at the formula s, = 3-2" —1. By thinking negatively, we 
can find the formula s, = 5 — 2"+! for the third set of initial conditions 
89 = 3, 81 = 1. (In Exercise 2.2 you’re asked to prove these formulas by an 
inductive argument.) Since we used the same recurrence (2.1) to generate 
all three sequences, it’s not surprising that there are similarities among 
the formulas. For instance, all three formulas involve powers of 2. One 
of our goals is to understand how and why the sequences corresponding to 
different initial values are similar. To put this another way, we’re interested 
in understanding the structure of the space of all solutions to a fixed kt 
order homogeneous linear recurrence when the initial conditions range over 
all k-tuples of complex numbers. 


Placing this in a more general setting, we’ve observed already that the set 
of all complex sequences forms a vector space over C. In Exercise 2.5 you’re 
asked to prove that the set of solutions to (HL) is a subspace of the space 
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of all complex sequences, and this means that every linear combination of 
solutions is also a solution. We’ll call this the solution space of (HL) 
and will denote it by ¥. 

We've seen that any sequence in ¥ is completely determined by its first 
k terms. This can be neatly expressed by defining a map 7 that picks out 
these terms and writes them as a vector in C*. Accordingly, we define 


(2.2) a: X > C* by m((sn)) = (8p-1,---, 81, 80)7 


Because every sequence in XY has a unique string of k& initial elements, 
this map is a well-defined function. The fact that any choice of values 
S0,---;5k—-1 can be extended to a infinite sequence that is in 4 means 
that a is an onto function. The companion fact that every initial value 
problem has a unique solution translates to the statement that 7 is a one- 
to-one function (that is, if 7(x) = m(y) for z,y € ¥V, then x = y). Since 
m is one-to-one and onto, it is called a bijection or a bijective function, 
and 7 has a (two-sided) inverse. This inverse is the map that assigns to 
any vector a = (az_1,-.., a0)? € C* the unique solution in ¥ that begins 
with the initial values 


$0 = 0, $1 = 1, .--- » Sk-1 = Ak-1- 


For any two solutions (x), (Yn) € 4X and complex scalars a and (6, we 
have! 


T-1 Yk-1 
a*m((in)) +B * T((Yn)) = ax : + Bx 
al Yi 
eo Yo 


Q* LR-1 +B * YR-1 


axa+ Px yy 
ax to +2 * Yo 


= 1(a* (Xn) + B* (Yn)), 


which shows that 7: ¥ — C* is a linear transformation. Since we’ve 
already shown that it is a bijection, 7 is an invertible transformation that 
is often called a vector space isomorphism. From linear algebra we know 


1Good notation often leads to seductively simple formulas that can obscure some 
important details, so it is good to form the habit of making a mental note of multiple 
uses of the same symbol. In this regard, observe that the symbol “++” has three different 
meanings in the equation displayed above: first as addition of vectors in C*, second as 
addition of complex numbers, and finally as addition of sequences. Check that you can 
also find the three different uses of the symbol “x”. 
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that the inverse of an invertible linear transformation is also linear, and we 
obtain the following theorem. 


Theorem 2.1.1. Let X be the solution space of a k*® order homogeneous 
linear recurrence. Then the map a defined in (2.2) is an isomorphism be- 
tween the vector spaces X and C*. In particular, X is a k-dimensional 
complex vector space. 


Since any element of a vector space is a (unique) linear combination of 
basis vectors, Theorem 2.1.1 can be interpreted as giving a method for con- 
structing solutions (s,) € ¥ using the k sequences generated by the basic 
initial conditions e,,e2,...,@n,, where e; is the i*” standard basis vector 
in C*, the column vector with 1 in the it position and zeros elsewhere. 
The preimage 7~!(e;) = (sp) is the solution whose only non-zero initial 
condition is s,_; = 1. Since the set {e),...,e%} is a basis for C* and 17! is 
an isomorphism, the set {7~'(e1),...,7~(ex)} is a basis for Y. Moreover, 
for given initial conditions 


S59 = a0, 61 =a), seey Sk-1 = QAk-1, 
the corresponding solution sequence is 
-1 = -1 
Qo (ex) + a,7 (eg) + s+) p17 (e1). 


We illustrate this with two examples, the Fibonacci sequence and the re- 
currence in (2.1) above. 


For the Fibonacci recurrence, the ba- 


; The Fibonacci numbers enjoy 
sic sequences are the two sequences 


many special properties that 
a-"(e1) = (0,1,1,2,3,5,8,13,21,...) a not necessarily shared by 

all recurrences. For instance, 

and the sequence 7~'(e1) is sim- 
ply the shift of the sequence 

a *(e2) = ae 0,1,1,2,3, 5, 8, 13,.. as nm "(e2) one term to the left. It 


follows that any sequence (s,)) 
satisfying the Fibonacci recur- 
rence has general term 


If we want to find the sequence gener- 
ated by the recurrence 


Sn42 = Sn41+ Sn, S9=—-1,5,=3, 
Sn = afin + bfn-1, 


we would add 3 times 7~!(e1) to —1 
times the sequence 7~1(e2), and this fora=s1, b=89. 
gives —1,3,2,5,7,12,.... 

For recurrence (2.1), the basic sequences are 


a (ei) = (0; 1, 3,7, 15,31, 63, 127, 255, 511,...) 
and 
a ‘(eg) = (1,0, —2, —6, —14, —30, —62, —126, —254,-510,...). 
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Therefore, the ninth term of the solution to (2.1) with s9 = 4, s1 = 11 is 
sg = 4(—254) + 11(255) = 1789. 


To summarize: Knowledge of the basic sequences of a homogeneous linear 
recurrence allows us to express the n™ term of any solution as a linear com- 
bination of simpler quantities, namely, the n*® terms of the basic sequences. 
Because this procedure uses k basic sequences to find the one sequence of 
interest, it isn’t very useful unless formulas for the basic sequences can be 
easily obtained. We’d like a basis for 4 consisting of sequences that are 
guaranteed to have an easy-to-find and simple formula for their n* terms. 
Obtaining such a basis is the goal of the rest of this chapter. 


2.2. The Matrix Form 


Since each term of a k* order recurrence is determined by the k preceding 
terms, it is useful to think of it as a function on k-tuples of consecutive 
terms. Because this function is linear, it can be represented by a matrix. 
For example, the recurrence in (2.1) can be written as 


es (ee) -[F 3] (er). )-@). 


while the usual Fibonacci sequence is encoded in the matrix equation 


ey (feat H (fm), (f)=(l). 


In each case the matrix takes us from one pair of consecutive terms to the 
next. 
In general, we express (HL) in matrix form by introducing the vectors 


Sn+k—-1 


(2.5) Gat) for all n > 0, 
Sn4+1 
Sn 


and finding a matrix A such that S,4, = AS;,. Since 
Sntk = C1$n+k-1 + C25n+k—2 feet Ck$n, 


this is accomplished using the matrix 


Cy C2 Ck-1 Ck 
1 O 0 0 
(2.6) A-|9 1 ign, 0) 
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which when applied to S,, shifts the components down one entry and puts 
the next term, $,4,, in the first component. The matrix A is called the 
companion matrix of the recurrence. It is also referred to as the com- 
panion matrix of the polynomial x* — c,x*~! — --- — cx. 

When the matrix form is used in an initial value problem, we have 


Si = ASo 5 So = AS, = A? 55 5 S3 = AS» = A? So 5 and so on, 
and this inductively gives the matrix form 
(2.7) Sp, = A"So. 


The matrix form of the usual Fibonacci recurrence is 


faa) _.ft 1 fl 
fn J {1 0 0} ’ 
and the matrix form of the recurrence in (2.1) with initial conditions s9 = 2 
and s; = 3 is given by 


3 -2]" (3 

os) s.-[? 2]'(). 
For the formula in (2.7) to be useful, we must be able to quickly compute 
powers of the companion matrix. Matrix powers can be easily computed 
for diagonal matrices, and the next easiest are matrices that are diagonal- 
izable, namely, square matrices that have a basis of eigenvectors. When 
A is not diagonalizable, a modification involving Jordan matrices can be 
used. (Refer to Appendix C.) 

Returning to the second example above, we set 


3-2 
4=|t co) 
which has characteristic polynomial 


=2 
—x£ 


cna(e) =aet (5% |) ee ee Ce 


and distinct eigenvalues, \; = 2 and Ag = 1. This means that A is diagonal- 
izable. To find the eigenvectors, solve (A — 2I)v; = 0 and (A — 1J)v2 = 0 


to obtain 
2 d 1 
vi=|,} andv2=|{,]- 


Therefore, for 
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powers can be computed using 


A” =(PDP™"')(PDP™")...(PDP™") =P pD" Pp“ 


RAB T ET 
“PAG U4 4] 


_ (etait errs 
=i) Sof mag. | 


Equation (2.8) therefore becomes 


= [1] G)= [a ee] @)= G4) 
me NL 0 fy aa shee —2"+42 2) \ 2P4+1 i)? 

from which we derive s,, = 2"+1 for all n > 0. A change of initial conditions 
is equivalent to changing the vector Sp = (3,2)", and the same procedure 
would yield the general term of the sequence generated from those new 
initial conditions. In Exercise 2.10 you apply this method to the Fibonacci 
sequence. 

Our general procedure for the case in which A is diagonalizable can be 
summarized as follows: 


Let Sn+k = CiSn+k—1 + Co8ntk—2 +:°°+ + CKSn and So = a. Suppose the 
companion matriz A of this recurrence is diagonalizable with eigenvalues 


rene 


. Find a basis for C* consisting of eigenvectors of A. 


. Use the basis elements as columns to form the matrix P. 


. Form the diagonal matriz D with the eigenvalues of A on the diag- 
onal, written in the order corresponding to the columns of P. 


. Compute PD" P-! = A” and the vector A"a. 


Then the solution is the sequence of first components of the vectors A°a. 


2.3 A Simpler Basis for the Solution Space 


In the last section the eigenvalues of the companion matrix played a key 
role in solving a recurrence. In this section we explore more properties of 
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these eigenvalues with the objective of obtaining a helpful basis for the 
solution space. We will refer to the polynomial 

ch(z) = 2" — ea’) —..< — e,_e — ty 
as the characteristic polynomial of the recurrence (HL), and its roots 
(the eigenvalues of the companion matrix A) will be called the eigenvalues 
of the recurrence. 

We can obtain an eigenvector associated with an eigenvalue X of the 
recurrence by first reminding ourselves that premultiplying a vector v = 
(v1,...,Up)’ by the companion matrix A shifts its first k — 1 components 
down one position and then inserts c,v1 + cgv2 +--- + crv, into the first 
position. Since ) is an eigenvalue, cy\"—! + coA*-1+.---+ ce, = A*, and we 
obtain 


MAN ig N LT MINA egy AGE pseu ay ly « 


This means that 
ee ae amare al ad 


is an eigenvector associated with X. 


Theorem 2.3.1. Consider the homogeneous recurrence 


Sn = C1$n—-1 — *°* — Ck-15n—k4+1 — CkSn—k- 


(a) If X is an eigenvalue of the recurrence, then m~!(vy) = (A"), where 
m is the vector space isomorphism in (2.2). Consequently, (X”) is in 


Xx. 
(b) If the recurrence has k distinct eigenvalues 1,...,Ax%, then the k 
sequences (XAT),..., (Az) form a basis for X, and every solution (s,) 


has the form 
(2.9) Sn = A1A}t + a2QX\5 +--- + aprz 
for some constants ay,...,an € C. 


Proof. In part (a), for all n > 0 we define s, = A” and the associated 
vectors S;, as in (2.5). Then S$, = A"v) and 


ASy, = "Avy =A" t Vy = Snir, 


implying (s,) € ¥ with initial vector So = vy. When the recurrence has 
k distinct eigenvalues, the eigenvectors vy, form a basis for C*, and the 
k sequences (A”) = 2~1+(v,) therefore form a basis for 4. Since every 
element of a vector space can be uniquely written as a linear combination 
of basis vectors, this implies that (2.9) does hold. Oo 
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2.8.1 Distinct eigenvalues 


For a specific vector (59, 51,-..,5%-1) = @ of initial conditions, how do 
we solve the initial value problem, that is, how do we find the a; in (2.9)? 
Using the inherent linear algebra and the isomorphism 7, this becomes the 
equivalent problem of solving 


a1V)\, +:++ + GeV), = & 


for the coefficients a,,...,a,. Translating this to a matrix equation, we 
want to find (the unique) a1,...,a% € C such that 


1 1 1 1 ay 
AL A2 A3 elie Ak ag 

(2.10) Se Oe ee 
Oe i ae ae pa Qk 


The coefficient matrix in (2.10) is called the Vandermonde matrix as- 
sociated with A1,...,Ax. It is named after Alexandre Vandermonde (1735- 
1796), the developer of the modern theory of determinants, and has appli- 
cations in almost every area of mathematics. (Consult [82] for an interesting 
survey of some of its many applications.) Because each vector of initial con- 
ditions specifies an element of 4’, the system of equations in (2.10) has a 
unique solution for each a, and the Vandermonde matrix must always be 
invertible. 
Let’s look at an example. For c? 4 —4c2, consider the recurrence 
Sn42 = C18n41 1 C28n, $80 = 1, 81 = Q2, 

whose characteristic polynomial ch(x) = x? — cyx — cp has distinct eigen- 
values, since its discriminant D = c} + 4c is non-zero. (In Exercise 2.18 
you answer the question of what happens when D is zero.) The eigenvalues 
of the recurrence are 


M=(e1+VD)/2 and d2= (ce, —VD)/2, 


with associated Vandermonde matrix 


The determinant of this matrix is Ag — Ay = —VD, which is non-zero. Its 
inverse is 


and 
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In general, the inverse of a Vandermonde matrix can be computed fairly 
easily. For this, define the auxiliary polynomials 


k 
P,(a) — [[¢ = Aj) — bia + bi,2@ Da a 


for alli =1,...,k. Since the ); are distinct, then P;(A;) is non-zero for all 
1, and for j x 2, P,(A;) = 0. 


Theorem 2.3.2. Let B be the k x k matrix whose i*® row records the 
coefficients of the polynomial P;(x) written according to increasing powers 
of x. Then V—! equals DB, where D is the diagonal matrix that has the 
(non-zero) diagonal entries 1/Pi(A1),-.-,1/Pr(Ak)- 


Proof. This proof is inspired by a mathematical note of F. D. Parker [124]. 
The diagonal entries in the matrix product BV have the form 


(bi ibedy aide Liang ae) =BOy) 40, 
and the off-diagonal entries are 
Og baiorc Pe Maia) =A Oy) =o. 
From this we see that DBV is the identity and V-' = DB, as claimed. O 


Let’s use this result to find an explicit closed form for the general term 
of 
Sn42 = Sn41+25n, So=1, s,=5. 


Since the characteristic polynomial ch(x) = x? — x —2 = (x—2)(x+1) has 
roots Ay = 2,A2 = —1, then P\(x) =1+2, Po(x) = —2+4+ & give 


ae 
p=| 2a], 


and from P,(2) = 3, P2(—1) = —3 we have 


Therefore, 


and from (2.9), 
8, = 2A" = Ag = 2 * + (-1)""". 
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2.8.2 Repeated eigenvalues 


In the previous section we showed that the vectors v),,...,V), form a 
basis for C* when the eigenvalues are distinct. What happens when the 
recurrence has some eigenvalues that are repeated roots of the characteristic 
polynomial? The vectors v), still form a linearly independent set, but the 
set is no longer a basis, since it has fewer than k elements. Although we will 
do considerably more work to identify enough additional vectors to form 
a basis, the actual result (given in Theorem 2.3.6 below) is only slightly 
more complicated. 

When 4 is an eigenvalue of multiplicity m, there exists a polynomial f(z) 
with complex coefficients such that 


ch(x) = (a —A)™ f(x), where f(A) 4 0. 


Denoting the differentiation operator on the space of polynomials by D, this 
implies that the value of D/(ch(x)) at 2 = equals zero for all 0 < 7 <m 
and is non-zero for 7 = m. Using the fact that 


i! 
Di(a') = 9 G5)! 
0 ify >i, 


gd iff <a, 


in Exercise 2.13 you prove the polynomial identity 
(2.11) Di(a!) = Dia") + ja"), 


where D® is the identity operator and the second summand on the right 
side should be interpreted as zero when 7 = 0. 

We next construct what are called the cyclic subspaces of C*. This study 
is motivated by another concept in linear algebra, the Rational Canoni- 
cal Form, a topic that is more advanced than our review in Appendix C. 
Because our description is explicit, what follows can be viewed as an illus- 
tration of Rational Canonical Form. You can consult [78, Sections 7.1—7.2] 
if you’re interested in more information. 

The cyclic subspace corresponding to the eigenvalue \ is defined 
to be 


(2.12) Xy = Span{v©,..., v6"? }, 


and the next result implies that the companion matrix A maps X) into 
X), which means that A is an operator on the subspace X). This will allow 
us to apply the theory of linear algebra to the vector space X}. 


Proposition 2.3.3. If X is an eigenvalue of multiplicity m, then 


(2.13) Av) = dow? + jv for all0 <j<m, 
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where ve) = 0, y = vy, and for all 7 > 1, yi?) denotes the element 


of C* obtained by applying D/ to each component of (w*~1,...,2,1)? and 
then evaluating each component at x = X. 


Proof. Setting x = (x*~!,...,2,1)", we will prove 

(2.14) AD! (x) = xD? (x) + jD?~1(x) for all j <m, 

which yields (2.13). From the linearity of the operator D we obtain 
AD? (x) = D3(Ax). 


Recall that for any vector v, the last k — 1 components of Av are obtained 
by shifting down the first k—1 components of v. Also, for any i 4 1, the i** 
component of AD/(x) is Di(x*-*+1), which from (2.11) equals «D4 (x*~*) + 
jDJ—*(a*-*). This proves the equality of the last k — 1 components in the 
vector equation (2.14). 

Now a comparison of the first components. The first component of Ax is 


eyaP 1 +--+ pie + cy = 2 — ch(x), 


which implies that the first component of DJ(Ax) is 


D3 (a* — ch(x)) = D3 (2*) — D3 (ch(z)) 
= 2D) (a!) + JD! *(a*~") — Di(ch(2)), 


again from (2.11). Equality of the first components in (2.14) is obtained 


from this and the fact that D/(ch(x)) has value 0 at x = 2. oO 
Theorem 2.3.4. If X is an eigenvalue of multiplicity m, then (a — )™ ts 
the minimal polynomial of the operator A on Xy, and S = fy), ee wry 


is a basis for X). 

Proof. From the definition of X) in (2.12), dim(X)) < mand S is a gener- 
ating set for X. It suffices to prove dim(X)) > m. We do this by showing 
that the degree of min(x), the minimal polynomial of A restricted to the 
subspace X, equals m. In fact, we prove that min() = (a—A)™. To prove 
this we will show that for all 0 < 7 <m, 

(2.15) (A—dADiv 40 and (A—AI™v =0. 


Fix 7 <_m. Then (2.13) can be rewritten in the form 


(A- ADvY? = jv ; 
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and repeated use of this gives 


(A—ADivY = (A- ADI“1((A — AD?) 
j(A— AD-1W9-Y 

j(A- AD? ((A- AD") 
jG —1)- (A—-ADI-P7V9-?) 


I 


(0) 


=gl vy’, 
which means that 
(2.16) (A—ADiv = jt vO for all j <m. 
Since v9) is an eigenvector, it is non-zero, and therefore (A — \J)/ is not 


the zero transformation on X. This proves that the minimal polynomial is 
not («—A)/ for any j < m. On the other hand, from (2.16) we also obtain 
for all 7 = 0,1,...,m—1, 
(A—AD"VY = jl. (A—AD™ FvO) 
= j!-(A—AD™ 3-1 ((A — ADv) 
=(A-AI™7-'0=0, 


since vy) = Vv) is an eigenvector corresponding to A. Therefore, (A — AI)™ 


is the zero transformation on the basis {v0), bes wary, and by linearity, 
(A —AI)™ must be zero on all of X). This proves min(x) = (#—A)™. O 


Corollary 2.3.5. Let \,,...,A4 be the different eigenvalues of the recur- 


rence, with corresponding multiplicities m1,...,mz. Then 
(2.17) fa, wey Cl er eo, a emma 


is a basis for C*. 


Proof. For each i, let X; = X), and let min;(x) denote the minimal 
polynomial of A restricted to X;. For i # j, X; Xj is an A-invariant 
subspace whose minimal polynomial divides each of min;(x),min,(x) and 
so also divides their greatest common denominator. But 4; # A; gives 
gcd((x — Ax)”, (a — Aj)") = 1, implying min(x) = 1 and X,N X; = {O}. 
Therefore, 


Span(X1 U X2U---U X;) = Span(X,) U--- USpan(X;) 
= X,UX2U---UX, 
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giving 
dim(X1 U X2U---U X;) =m, +--+ +m, 
which does equal k = deg(ch) since the characteristic polynomial is 
ch(a) = (a — Ay)" +++ (a@— AL)™*. 
O 


This basis is the simpler one we promised earlier, and using this basis in 
Theorem 2.1.1 gives the following result. 


Theorem 2.3.6. Let \1,...,A4 € C be distinct. Then (s,) is a solution 
to the recurrence (HL) with ch(x) = (x — A1)™++-(a@ —Az)™ iff the n™ 
term of (8n) has the form 


(2.18) Sy = ai(n)AT +-+- + ar(n)A?, 


where each a;(x) is a polynomial whose degree is less than mj. 


Proof. In Exercise 2.15 you check that each sequence whose n*" term has 
the form given in (2.18) satisfies (HL). If {bi,...,b,} is the basis in (2.17), 
the fact that 7 is an isomorphism from ¥ to C* implies that every solution 
o (HL) has the form 


(2.19) (8n) = Bim" (bi) + +++ + Bem *(bx), 


for some complex constants (1,..., 8%. Each b; equals vi) for some eigen- 


value A, and 7 is less than the multiplicity of 4. The n*® term of a 1(v)) 
is D*(x") evaluated at x = \; that is, it equals 


——— A GG Sy 
0 ify >2, 


where the coefficient n!/(n — 7)! is a polynomial in n whose degree equals 
i. Therefore, (2.19) yields the result. O 


As an illustration of the method consider the following initial value prob- 
lem: 


Sn44 = $n43 + 38n42 — 58n41 +25, with so = 1,5, = —1,5s2=0,53 = 1. 


Then ch(x) = 2* — 23 — 3x? + 5x — 2, and its companion matrix is 


1 3 -5 2 
1 0 0 0 
ae 0 1 O 0 
0 0 1 0 


2.4 The Asymptotic Behavior of Solutions 25 


with eigenvalues A; = 1,A2 = —2 and respective multiplicities m; = 3, 
mg = 1. Therefore, there exist polynomials a1(x),a2(x) such that any so- 
lution (s,) to the recurrence has general term 


Sn = a4(n)(1)” + ag(n)(—2)”. 


Further, deg(a1) < 3 and deg(az) < 1 imply that there exist constants 
3; € C such that 


ay(x) = Bo + Bia + Box” and a2(x) = Bs, 


which give 

Sn = (80 + Bin + Gyn”) + B3(—2)”. 
Using the initial values, this reduces to solving a system of four equations 
in the unknowns (3, 31, G2, 03 with augmented matrix 


100 1 1 


1 11 -2 -1 
124 4 0 
1 3 9 —-8 1 


Gaussian elimination can be used to obtain the row reduced echelon form 


1000. 8/9 


0100 -8/3 
0010 1 |? 
0001 1/9 


and from this a;(x) = § — $a+2 


The k x k matrix V whose columns consist of the vectors vi) (written 
as columns according to the order given in (2.17)) can be called the gen- 
eralized Vandermonde matrix associated with the recurrence. In order 
to find the polynomials a;(x),...,a(a) corresponding to the initial con- 
ditions (59, 51,..-,5%-1) = @, we solve the matrix equation Vx = a. As 
with the standard Vandermonde matrix, the fact that every initial value 
problem for (HL) has a unique solution translates to the invertibility of 
the generalized Vandermonde matrix. The explicit form of V~+ is found in 
Exercise 2.17. 


2.4 The Asymptotic Behavior of Solutions 


We will now use the matrix form S,, = A”So to estimate the asymptotic 
size of solutions to (HL) without computing A”. The basic idea is to write 


26 2. Homogeneous Linear Recurrence Relations 


the initial condition So as a linear combination of k linearly independent 
vectors C),...,C, for which the asymptotic size of each A"C; is known. 

We first consider the principal case for applications, the case in which A is 
a diagonalizable matrix. This means that there exists a basis {C1,...,C} 
of eigenvectors for C*, and we can write Sp as So = a a;C;, for some 
a; € C. If all of the dot products CFC = 0 with i 4 j happen to be zero (in 
this case the matrix A is said to be orthogonally diagonalizable since 
the basis is an orthogonal set), then 


k k 
CF So = or S > aiC; = S> aiCy C; = a;||C;||? ; 
i=1 i=1 
giving aj = CP Xo/||C;ll?, where ||C'|| is the Euclidean length of the vec- 
tor C. Orthogonality therefore gives an easy expansion of Sp in terms of 
the column eigenvectors of A. Although the orthogonality of the eigenvec- 
tors can’t be guaranteed, we will see that a helpful property that we call 
biorthogonality always holds when A is diagonalizable. 

If X is an eigenvalue, then each of the pair of homogeneous systems 
(A — \I)X =0 and X(A — AI) = 0 has a non-zero solution. For our pur- 
poses, we’ll call a non-zero solution to (A — AI) X = 0 a column eigen- 
vector, while a non-zero solution to the second system will be referred to 
as a row eigenvector. Letting Z be the k x k matrix whose columns are 
the column eigenvectors C,,...,C, of A, then AZ = ZD, where D is the 
diagonal matrix with diagonal entries \,,..., Ax. From the linear indepen- 
dence of its columns, Z is an invertible matrix and Z~'A = DZ~—', which 
means that the i*” row R; of Z~1 is a row eigenvector corresponding to j. 
From Z~!Z = I we also know that R,C; = 0 when i ¥ j, and we call the 
two sets of vectors {Ri,..., Rx} and {Ci,...,C,} a pair of biorthogonal 
sets. (Also note that R;C; = 1 for all i.) Mimicking the orthogonal case, 
from Sg = a,C, +---+ apC, we have 


R;So = Ri(arCi tose apCr) = a,R;C;i = a;. 
Therefore, So = > | (R;S0)C;, and 
k 

(2.20) Sn = A"So = _ (RiSo)(ACi) , 

i=l 
which we record in the following lemma. 
Lemma 2.4.1. Let A be any diagonalizable kx k matrix. Let Z be the ma- 
trix whose columns are linearly independent eigenvectors Cy,...,Cr, and 


let R,,...,Ry be the rows of Z~!. Then any solution to Sn4, = ASp can 
be written in the form 


k 
= 2 (RiSo) APC. 
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When A has fewer than & linearly independent eigenvectors, then the 
matrix Z whose columns are the generalized eigenvectors (which form 
a basis for C*) still satisfies (2.20) where R1,...,R, are the rows of the 
matrix Z~!. 

When we order the different eigenvalues according to decreasing complex 
modulus, 


(2.21) |Ar| > [Ao] > +++ > [Ae] (where t < k), 


A; is called a dominant eigenvalue of the recurrence. If |\;| > |A2| 
holds, then |A;| is called the strictly dominant eigenvalue. (Note that 
strict dominance does not prevent A; from being a multiple eigenvalue.) 
For the case in which , is a simple strictly dominant eigenvalue, we have 


k k 
S, = A" So = S- (R;S0)(A"C;) = (Ri So)ATC, + 5 (R;S9)A"C; ; 
i=l 4=2 
and so e 
— = (RiSo)Ci1 + Yn, 
At 


where Y,, = ae (RS) A"C;/A?. Since each coordinate |Y,,|; of |Y;,| is 
bounded above by |A2|" times a polynomial in n (refer to Theorem 2.3.6), 
the limiting value of |Y,,|;/A7 exists and is zero. Therefore, 


where C} is a column eigenvector of A; and R, is the specific row eigenvector 
of A; chosen above. For any other row eigenvector R corresponding to Aj, 
(RSo)/(RC) = Ri So holds, and we obtain the following theorem. 


Theorem 2.4.2. If \1 is a simple strictly dominant eigenvalue of the kx k 
matrix A, then 


where C is any column eigenvector corresponding to 1, R is any row eigen- 
vector corresponding to 4, and a = (RSo)/(RC), a quotient of dot prod- 
ucts. 


The upshot of this result is that for many reasonable homogeneous re- 
currences and most initial conditions, the asymptotic behavior of a solution 
can be considered to be one-dimensional. Full details of a solution would re- 
quire a somewhat more complicated formula, but as long as the dot product 
RSo is non-zero, the solution converges in a ratio sense to a single vector. 
When the dot product Rp is zero, the theorem says only that the solu- 
tion grows more slowly than |\;|", and it gives no more information about 
the solution. The other main point is that the value of a can be computed 
without computing the rest of the coefficients in the full expansion of the 
solution. 
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2.0 Exercises 


Ex 2.1. Show that every initial value problem has a unique solution. 


Ex 2.2. Use induction to verify the closed formulas for the recurrence (2.1) 
under each of the following initial conditions: Sp = (3,2)? ; So = (1,3). 


Ex 2.3. Suppose that the sequence (s,,) satisfies the Fibonacci recurrence 
and s3 = 5,56 = —2. What is s4? What is s19? What are the initial values 
Sq and s1? 


Ex 2.4. Construct an argument that concludes there is a sequence (s,,) 
that satisfies the Fibonacci recurrence for which sg9 = 23 and sj999 = 56. 


Ex 2.5. Show directly that any linear combination of solutions to (HL) is 
itself a solution. 


Ex 2.6. For a fixed polynomial Q in k > 2 variables consider the set V of 
all sequences (s,,) that are solutions to the k*" order recurrence 


Sntk = Q(Sn, Sn41y-++5 Sntk—-1) : 


If V is a vector space, show that Q@ must be linear with zero constant term. 


We know that the set of all sequences of complex numbers forms a com- 
plex vector space under the operations of addition and scaling. Applying 
the definition of linear independence (refer to Appendix C) to this vector 
space, we see that sequences ¢1(7),...,¢(n) are linearly independent 
iff 


bi di(n) +--+ + bebe (n) = 0 for all n > 0 eign = he =o. 


Ex 2.7. Show that every k‘® order homogeneous linear recurrence has k 
linearly independent solutions. 


Ex 2.8. Let ¢1(n),...,@%(n) be k linearly independent solutions to a kt 
order homogeneous linear recurrence. Show that the solution to any initial 
value problem for this recurrence can be written in the form > bidi(n). 


Ex 2.9. Let ¢1(n),...,¢%(n) be solutions to a k'® order linear recurrence. 
Show that the following three statements are equivalent: 

(a) > bidi(n) is the zero sequence. 

(b) For allO<n<k, *, b:di(n) = 0. 

(c) 3 bi¢i(n) = 0 for k consecutive values of n. 


Ex 2.10. Use diagonalization of the Fibonacci matrix to obtain Binet’s 
Formula (1.5), the closed form for the Fibonacci numbers. 
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Ex 2.11. Let sp42 = C1Sn41 + C28, be a second-order recurrence with 
eigenvalues A; = (c; + VD)/2 and Az = (ce: — VD) /2, where D = c? + 4co. 
If D #0, show that the sequence whose n*” term is 


1 = i n n 
Wiel 1 F-1) 4 an (Ae 3) 


is the unique solution with initial conditions so = ao, $1 = Q. 


Ex 2.12. Show that the Vandermonde matrix associated with \1,...,Ax 
is invertible iff the A; are distinct. 

Hint: Premultiply the matrix by a row vector where the product for each 
entry is interpreted as the evaluation of a polynomial at );. 


Ex 2.13. Use the product rule for differentiation and induction to sub- 
stantiate identity (2.11). 


Ex 2.14. Solve the initial value problem 
Sn4+3 = 25n42 + 5$n41—-65n3 So =9, 51 = —18, So = 66. 


Ex 2.15. Check that each sequence whose n* term has the form given in 
(2.18) satisfies the recurrence (HL). 


Ex 2.16. Use Theorem 2.3.6 to find the solution to 


Sn44 = 85n42 — 16,59 = —1, 51 = 8, 52 = 4, 53 = 16. 


Ex 2.17. Let \1,...,A, be distinct complex numbers and let mj,,..., mz 
be any positive integers that sum to k. Let V be the k x k generalized 
Vandermonde matrix associated with this data, the matrix V whose 
columns are the vectors given in (2.17). 

(a) Construct an argument that proves V is invertible. 

(b) If A is the companion matrix for 


P(a) = (@ — 1)" (a — Ag)™ +++ (@— At)”, 


compute V~!AV. 


(c) For 1 <i<t,1< Jj < mi, define Q(x) = , and expand it 


(x — ri)! 
as a polynomial of degree k — 1 in a: 


Q(x) = bo + bya +---+ b,—ja* 9 freee Hf a aie ; 


Show that deg(Q) < k— 7. 
(d) Show that 


0 ifj=t, 
(bo, b1,- wes OK-1) yf) ta J 
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(e) Compute the precise value of the dot product in (d). 
(f) Use the information from the previous parts to construct the inverse 
of V. 


Ex 2.18. Let sn42 = C1841 + C28n42 be a second-order recurrence such 
that D = c} + 4c is zero. Using the techniques of Section 2.3.2, show that 
the sequence with general term 


cy \™ QaoCc1 cy\ "1 
sn=a0(F) +n(a-“) (F) 


is the unique solution to (HL) with s9 = ao and s; = ay. 


Ex 2.19. Write a general procedure for solving (HL) given knowledge of 
the roots of the characteristic polynomials and their multiplicities. 


Ex 2.20. Show that 
. a(n+1) 
lim ———— 
n—2co a(n) 


=1 


holds for any polynomial a(x) € C[z]. 
The next two exercises assume the ordering of eigenvalues given in (2.21). 


Ex 2.21. Consider a solution (s,) to the homogeneous linear recurrence 
Sntk = C18ntk—-1 + Co8n+k—2 +°++ + CkSn- 
(a) Show that if |Ai| <1, then limp. s, = 0. 
(b) Show that if the coefficients in (HL) are integers and cy, # 0, then 
lag | 


Ex 2.22. Let sn42 = C1Sn+41 + C28n be a second-order recurrence with 
ec, £0 and c? + 4c > 0. 
(a) Show that A; is strictly dominant. 


(b) Prove that for any (s,) € ¥ the limit lim = exists. Moreover, when 


n—oo a 


81 # SoA2, show that this limit must be non-zero. 


Ex 2.23. For each n > 1 let D,, be the determinant of the n x n tridiagonal 
matrix 


a 1 0 O ... 0 0 0 
la 1 0 0 0 O 
0 lai tl 0 0 O 
0 0 0 O .... a@ 1 0 
0 0 0 0. ... 1 aii 
0 0 0 0... 0 lia 


where a is some fixed real number. Show that the sequence (D,,) satisfies 
the second-order recurrence D, = @Dy_1 — Dn—2 with initial conditions 
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Do = 1 and D, = a. Find the values of a for which lim;—.. |Dn| = co. 
For what values of a is D, = O(n)? What is the asymptotic size of Dy, 
for other values of a@? Find the values of a for which D, is periodic, and 
decide what periods are possible. 


3 


Finite Difference Equations 


3.1 Linear Difference Equations 


A difference equation is the discrete analog of a differential equation. Al- 
though differential equations are typically studied earlier in a mathematical 
curriculum, there are many respects in which the theory of difference equa- 
tions is simpler. A finite difference equation has the general form 


(3.1) Sn = D($ip 1, Baya 24 SH YS 


where © is a fixed complex-valued function and the integer k is called the 
order of the equation. The initial value problem for (3.1) is the problem 
of finding a sequence (s,,) that satisfies (3.1) for a given function ® and a 
fixed initial value vector So = (s,—1,---, 81,80). An initial value problem 
has only one solution, because (as in Chapter 2) for any fixed k initial values 
80,-++;8k—1, the k*® term is specified by s;, = ®(s,—1, $—-2,---,80,k), and 
all successive values are similarly found. From this we obtain the following 
result. 


Theorem 3.1.1 (Existence and uniqueness theorem). Every initial 
value problem for a finite difference equation has a unique solution. 


The essential ingredient in the proof of this result is the assumption that 
® is a function, that is, that ® returns exactly one value for a given input. 

The function © is called a linear function in the variables s,_1,..., 5n—k 
if there exist k + 1 complex-valued functions gi,...,g, and w defined on 
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the natural numbers such that for all n > k, 
®(Sn-1, Sn—2)-+++)5n—k, n) — gi(N)S8n—1 a 9k (N)Sn—k a w(n) ’ 


where w is called the input or forcing function. When ® is a linear 
function, then (3.1) is called a linear difference equation. For the special 
case in which the coefficients g1,..., 9, are constants, (3.1) becomes 


Sn = C18n-1 +++ + CeSn-e + Y(n) for c,...,@ EC, 


and it is called a constant coefficient equation. 

In this chapter we study linear constant coefficient difference equations, 
and we will simply refer to them as difference equations or recurrences. 
Every such recurrence can be written in the form 


(L) Sq — C1Sy=1 — Ca8q—9 — > — Gap = O(N) for 1 > hy, 


for some complex-valued function w and constants c1,...,Cx, where cz is 
non-zero. The homogeneous linear constant coefficient equation (HL) stud- 
ied in Chapter 2 is the special case in which the input function (2) is the 
zero function. 


3.1.1 First-order equations 
A first-order recurrence has the form 
(L1) Sn = ASn-1 + (Nn) foralln>1, 


where the function w is defined for all positive natural numbers and \ + 0. 
When we consider the associated initial value problem for fixed sp = ao, 
we have 


50 = Qo; 

81 = Aso + Y(1) = Aag + (1); 

82 = As, + (2) = Aap + AY(1) + ¥(2) ; 

83 = Asp + (3) = Aa + A7H(1) + AY(2) + ¥(3); 


and this gives the following form for the solution. 
Theorem 3.1.2. The initial value problem 
(3.2) Sn =ASn-1 + U(n), 80 =Q0, 
always has the unique solution (s,) with general term 
(3.3) Sn = AA" +S > (i)A"™ 

i=1 


(where the sum is defined to be zero when n = 0). 
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Proof. Since we already know that (3.2) has a unique solution, we will show 
by induction on n that the solution has the form given in (3.3). 

For n = 0, observe that (3.3) reduces to s9 = ao, which is the initial 
condition. For the induction step, assume that (3.3) does satisfy (3.2) for 
all n = 0,...,& —1 and then use the defining equation (3.2) to find sx 
from s«K—1 and w(K): 


8K =AsK-14+ U(K) 
K-1 


=r [owe +0 waar) + (Kk) 


i=1 
K 
= apr*® + Sobre, 
i=1 
as required. Oo 
This proof is remarkably simple when compared to the corresponding 
theorem for differential equations. To solve a first-order initial value prob- 
lem in differential equations, 
D(s(t)) + es(t) = Y(t), (0) = a0, 
it’s necessary to multiply through by an “integrating factor” e“ so that the 
equation becomes 


D(s(t)) e + cs(t)e* = D (e“s(t)) = e* w(t), 


which has the solution 


t 
sih=e™ (/ eah(t) it) +e a9, 
0 

provided the integral exists. From this we see several ways in which dif- 
ferential equations are more difficult: the existence of a solution is not 
guaranteed, finding a solution is harder, and some theory is needed to deal 
with uniqueness. 

Let’s look at the simplest example of a nonhomogeneous first-order re- 
currence, when the input a(n) = c is a constant, 


(3.4) Sn =ASn-1 +¢C, S89 =a. 
From (3.3) we have 
Sn = AA" + (AM +A? H--- +1), 


which can be computed to be 


oye foe ier 


ao + cn ifrA=1. 
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The difficulty with the form of solution in Equation 3.3 is that it contains 
a summation, and we would prefer solutions without summations. As the 
example shows, some summations are easy to replace. In particular, any 


summation of the form 
nm 


yr 


i=0 


(which is usually called a geometric series) can easily be replaced by a 
closed form formula. More generally if ¢(n) can be written as 


W(n) = bry + bays +--+ + biz’ 


then 


n n-1 


Sy@rr* = Vn 


i=1 i=0 


=Yoey ~) = Lon (a 


j=l i=0 jel 
l 
d\n an 
= Sob (<—) ( if \ 41, for each j). 


Of course, a similar formula is possible when some y; satisfies yj; = A. 
Further, not much more complicated formulas are possible when (n) has 
the form 


pi(n)y + paln)yy + +++ + pln), 


where each p;(m) is a polynomial in n. Instead of pursuing these summa- 
tions in the context of first-order equations, we will move on to k*® order 
equations, and in Section 3.3 we will deal with inputs that have this special 
form. 


3.2 General and Particular Solutions 
Because our difference equations are linear, we can conveniently rewrite 
Sn = C18n—1 + C28n—2 +e + CKSn—k + w(n) 


by using 
L[8n] = Sn — €1$n—1 — C28n—2 — +++ — ChSn—k 
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and writing 
Lsn] = (n). 
When we consider solutions to the homogeneous version of this recurrence 
L[sn] = 0, 

we have 

if L[z,]=0 and = Lly,| =0, 

then L[tpn, + yn) = Ll[an] + Llyn] = 0, 

and Llc ap] = c L[rn] = 0. 


Because of these properties we call L[] a linear operator. (Refer to Ap- 
pendix C.) 
If we have k linearly independent sequences 


(2), (2), ..., (ao) 


such that 
Lx 1) Stes Lia} = 0, 


n 


then for every choice of constants a1, ..., Qk, 


k 
L Son = 0. 
i=1 
(%) 


Since we can choose the a;’s, we can force A ean a;xn, | to satisfy any set 
of k initial conditions. Specifically, if the initial conditions are so, $1,..., 8-1, 
then 


k 
(3.6) Saag = So 
i=1 


k 

a 
) ax) = S1 
i=1 


k re 
S- at, = Spa 
i=1 
and by solving this system of k linear equations in k unknowns we can 
calculate ay,...,az%. The assumption that the sequences are linearly inde- 
pendent assures us that this set of linear equations has a unique solution. 

Linearity will also allow us to break the the problem of solving 


Lsn] = ¥(n) 
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into two parts. We let s, = hyn + Uy, with 
L{h,| = 0 and Lfivp] = w(n), 


and by linearity 
L[sn] = Ll hn + Un] = w(n). 


We are not ready to use the initial conditions. We want to find a v, that 
satisfies 


Lon] = (rn), 


but v, does not need to satisfy the initial conditions. We call such a vy, a 
particular solution. In the next section we will look at some methods for 
actually finding a particular solution. 

As we saw above, if we can find k linearly independent sequences 


such that 


n 


then for for every choice of a;’s we have 


Since this summation represents a whole family of solutions to the homo- 
geneous difference equation, we call 


k 
dian) = In 
i=1 


the general solution to the difference equation and denote this solution 
by (gn) . Then 


L[ gn + vn] = Ll gn] + Ll vn] = 0+¥(n) = y¥(n), 


and we can state that every solution of the difference equation can be 
written as the sum of the general solution and a particular solution. 
Actually, this does not yet solve the initial value problem because g», still 
has k unspecified coefficients, a1, a2,..., ax. If we choose these coefficients 
such that gn + Up satisfies the initial conditions, we will be done. But this 
is the same process as solving the system of linear equations (3.6), with the 
difference that we subtract the initial values of v, from the original initial 
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conditions. The system of equations to be solved is 


k * 
a a 
Qix5’ = So vo 
i=1 


k 
a) = 
Ot, = 81 V1 
i=1 


k 
a,x’) = Sk-1 — Uk-1:- 
t=1 


Letting (h,,) be the solution with these computed values for the a;’s gives 


and (hy, + Un) satisfies the initial conditions. 
We may also view this solution process as 
(a) Find any (v,) that satisfies 


Lion] = o(n). 
(b) Find (h,,) such that (h, + vn) satisfies the initial conditions and 
Ll hn] = 0. 


In this process the second step is simply solving an initial value problem 
for a homogeneous difference equation. 


3.2.1 Finding a particular solution via summation 


We have proved (Theorem 3.1.2) that the sequence (v,,) defined by 
io Ty tee} MOA = cong) A a Ae) 
i=1 


(where the last expression should be read as a summation or dot product) 
is a particular solution to the first-order equation 
Sn = ASn-1 + U(n). 
The solution can be rewritten as the dot product 
Un = (P(1)y-+5B(n)) « (tnaty ++ sto); 
where t, = X” is the solution to the homogeneous initial value problem 
Sn =ASy, With sg=l1l. 


We can generalize this formula to general k*" order difference equations. 
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Theorem 3.2.1. Let (tn) be the solution to the k*> order homogeneous 
initial value problem 


Sn = C1$8n-1 + C28n—-2 t+ + CKSn—~ With (sp-1,---, $0) = (1,0,...,0). 
Then for any sequence y(n), the sequence (Un) given by 
2 a 0, Uk-1 = i Un = (v(k), tee ,W(n)) . (tn—1, cas ,te-1) 


is a particular solution to the difference equation 


(3.7) Sn = C18n—1 + C28n_2 +++ + Ce8n_e t+ (Nn). 
Proof. Since vp = +++ = Up—2 = 0, the value vg = W(k)tp-1 = U(k) does 
satisfy (3.7) for n =k. For n > k, since tp = +--+ = tp_g = 0 then 


C1Un—1 + C2Un—2 + +++ + CKUn—k 


equals 
€1(tn—2(k) +--+ te p(n — 2) + te_ip(n—1)) 
+ Co(tr—sp(k) ++ +++ te-1y(m — 2) + th_op(n — 1)) 
+ ce(tn—1-KY(k) +--+ + tr W(n—- 2) + tov(n—-1)). 


Adding down the “columns”, this becomes 


(2 citn-1-1) VC) coe 03 cith-a) ¥(n —1) 


= tr-i(k) +--+ tep(n — 1) 
(tr—1y+++sthey 1) (W(K),---, or — 1), ¥(n)) — a(n) 
= Un — W(n) 


by definition of v,. By induction, (vp) is a solution to 


Un = C1Un—1 tet + CKUn—k + (Nn). 


Let’s find a particular solution to the second order equation 
(3.8) Sn = 28n—-1 — S8n-2+ OF 


As in the theorem, if we let (¢,,) be the solution to the associated homoge- 
neous initial value problem 


Sn = 28n_1 — 8n—23 (51, 50) = (1,0), 
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its first few terms are to = 0,t; = 1,t2 = 2,t3 =3,..., andt, =n can be 
proved in general by induction. Then w(n) = 2” gives 


Un = (W(2),...,W(n)) -(n—1,...,1) 
SP cn 8 Oo) ee 1 


=S0 ® (nt+1-)) 
j=2 


as a particular solution of (3.8). In this case it’s relatively easy to get a 
closed form for (v,) because the sequence satisfies a first-order equation 
since 


eg (2? yang VF ly oe Gd ARO jae 2 2s A Bite LO) 
= 27(1 +--+ +24) + ony = 27(2"-* — 1) + op-a. 


Therefore, using \ = 1 and w(n) = 4(2"-'— 1), (vn) is a solution to 
Un = AVn-1 + U(n), and from the result for first-order equations, 


is a particular solution to (3.8). Notice that determining this closed form 
relied on being able to compute t,, easily as well as on seeing a linear 
relationship between consecutive elements of the constructed particular so- 
lution (vp) . 


3.3 A Special Class of Linear Recurrences 


We will now generalize the last section and develop an efficient method 
for obtaining a particular solution for a frequently occurring class of linear 
recurrences, the ones whose forcing function is w(n) = A” p(n) where X is a 
constant and p(n) is a polynomial. 
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Theorem 3.3.1. For any polynomial p(x) and any constant , consider 
the linear recurrence 
Sn = C1Sn—1 $+ + CKSn—K +A” P(N). 


Let »1,...,A4 be the eigenvalues of the recurrence with respective multi- 
plicities m,,...,m4 and define 6 by 


j= 0 f AE {Ay <i Adh 
[ite A= Ae 


If deg(p) = d, then there exists a polynomial q(x) with deg(q) < d such 
that Un = A"n°q(n) is a particular solution of the recurrence. Moreover, 
the polynomial q(x) can be calculated in O(d?) operations. 


Proof. Our proof actually constructs the polynomial g. We may assume 
that A 4 0, because if AX = 0, the equation is homogeneous, and vz = 0 
satisfies the recurrence. Then \"n°q(n) is a solution and for all integers 
n> 0, 


An g(n) — cA" (n — 1)°q(n — 1) 
—cnA"—?(n — 2)®g(n — 2) — +++ — eA" -*(n — k)*q(n — kb) = A” v(n). 


Dividing both sides by the non-zero \"—*, the equation becomes a polyno- 
mial equation in n, which holds for the (infinite) set of natural numbers, 
and so in fact, the equation is the polynomial identity 


dx? g(a) — c,A*-1 (a — 1)°q(a — 1) —--- — eg (x — k)8q(x — k) = A*p(z). 


Recall that p(x) = pax? +---+p12+po is a fixed polynomial and our goal is 
to construct coefficients qa, ...,@1, qo such that q(x) = qaw4+---+qi4+ 40 
satisfies this identity. Computing a few coefficients, the coefficient of «4+? 
on the left side is 


Maa — c1.*"" qa — +++ — cea; 


which is ch(A)qa, and the coefficient of x°+4-! is 
= d+0o 
Mqa—1 — cd" ‘(-n( l ) a = qi-1) 


= cd? ((-2) Gi *) qa + qi-1) 
es = (0 (“7 a0 + ae-1) 
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which we can write as —t, (1) aa + toga—1 using to = ch(A) and ty = 
xy c;(—j)\*4. In general, defining t; = ee cj(—J)*A*-F for all i = 
0,1,...,d, it can be shown that the coefficient of 2°+4-J on the left side is 


o+d 6+d—(j-1 
-1;( j )ae--- =a o Ya») + totes. 


We first consider the case in which A is not an eigenvalue. Then to = ch(A) 
is non-zero, 6 = 0, and the requirement becomes the dot product 


d d—(j-1 
(-t; (5), a : ) to) . (Garett ijadas4) = rp; , 


for all7j = 0,...,d, asystem of d+1 equations in the unknowns qq, ... ; 71, Go- 
The coefficient matrix is a lower triangular matrix in which each diagonal 
entry is the non-zero to, which means that the system has a unique solution. 
This gives the unique polynomial g(x) whose degree is at most d. 

If \ is a simple eigenvalue, then 6 = 1 and to = 0, which means that the 
coefficient of x4+® on the left side is qato = 0. Since the right-hand side 
has degree d, the coefficient of 27+! is 0. These two facts give us the trivial 
equation 0 = 0. We have d+ 1 remaining equations for the d+1 coefficients 
of gq. These are 


1l+d 1+d-—-(j-1 
Co Ge) Cee 


for all 7 = 1,...,d+ 1, again a lower triangular system in which each 
diagonal entry is now t,. Fortunately, 


k 
kch(ax) — xch'(x => at c;(— 
j=l 


which means that 
kch(A) — Ach’ (A) = th. 


Since \ is a simple eigenvalue, then ch(A) = 0 but ch’(A) 4 0, and so t; 4 0 
as required to ensure a unique solution. 

What about the remaining cases, in which the multiplicity of \ is m > 1? 
Here it seems that we have m+d-+ 1 equations for the d+ 1 unknowns 
das a—1,- ++, 49. But ch(A) = 0, and for each j < m the value of D\) (ch) 
at «= 2 is zero. This means that the first 6 = m equations are simply the 
redundant equation 0 = 0. The remaining d+ 1 equations will specify the 
d+ 1 coefficients of g(n). Notice that t, can be written as a linear combi- 
nation of ch(A), D(ch)(A),...,.D™ (ch)(A) with a non-zero coefficient for 
D(™(ch)(A) (refer to Exercise 3.6). Since we have assumed that \ has 
multiplicity m as a root of ch(A), then 


ch(A) = D(ch)(A) = +--+» = D(™~) (ch) (A) = 0, 
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and D!'™ (ch)(A) 4 0. So tm # 0, and the lower triangular system with 
tm along the diagonal yields unique values for the coefficients qa,..., Qo- 
Now for the operation count. Since k is a constant, each of the O(d?) 
entries in the coefficient matrix can be computed with a constant number 
of multiplications. Further, because the matrix is in triangular form, the 
associated system of equations can be solved by back substitution. That 
is, gq can be computed in one division; gg_1 is computed using this known 
value of gq and one division; in general, each gg_; is computed from the val- 


ues Ga, dd—1;-++;4d—(j—-1) USing O(j) operations. This means that once the 
coefficient matrix has been computed in O(d?) operations, the coefficient 
set qa,---, Qo can be found using O(d?) operations. oO 


For the second-order recurrence 
Sn = 28n—1 — Sn-2 + 2” ’ 
» = 2 is not an eigenvalue of the recurrence, which means that 6 = 0, 
and deg(q) = 0 follows from p(n) = 1. This gives the particular solution 


Un = 2"c, where c is a constant. To solve for c, 


Un — 2Un—-1 + Un—-2 = 2” , 


Qe — 2.9" laagn 26= 9" 26= 2", 


giving c= 4. And v, = 2”*? is a different particular solution from the one 
we found earlier. 
For 
sn = 4 Sn] _ 4 S8n_2 = Le2", 


the input function is U(n) = 1-2"; that is, p(n) = 1 and A = 2. Since 


2 is a double root of the characteristic polynomial \?7 — 4 + 4, we 
have 6 = 2, and we expect a particular solution v, to have the form 
Un = A?n%¢q(n) = 2"n?c because p(n) is a constant. Substituting this vp, 


into the equation, we have 
Qn? = 4(2"-1(n—1)*%c) — 4 (2"-7(n — 2)%c) + 2”, 
and dividing by 2” gives 
nc = 2(n—1)%c — (n—2)7c. 
Equating the coefficients of n? gives 
c=2c- Cc, 
which is true but uninformative. Equating coefficients of n gives 


0= -4c+ 4c, 
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which again is true but uninformative. Finally, equating the constant terms 
on each side gives 
0O=2c-—4c+1, 


which implies that c = 1/2, and that the particular solution is v, = 2"~1n?. 
As above, this particular solution is equivalent to the polynomial identity 


n? = 2(n—1)? — (n—2)? + 2, 


which can be verified by expanding the terms or by checking this 2"¢ order 
identity at 3 points. For example, choosing n = 0, n = 1, and n = 2 gives 
the three valid equations 

0=2-442, 

1=0-14+2, 

4=2-04+2. 


Appendix A contains more worked problems that use this technique. 


3.4 Operator Notation 


Operator notation is a convenient way to display some facts about difference 
equations. Here, we’ll use this notation introduced in Section 3.2 to explain 
the superposition principle and to show why it was easy to solve the special 
class of recurrences in Section 3.3. 

Superposition means putting one thing on top of another. In our context, 
it means combining several sequences to obtain a new sequence. For linear 
difference equations the superposition principle is that 


Lan] = O(n) + v(n) 
can be solved by solving the two equations 
Lyn] = 9(n) and Llzn] = ¥(n), 
and then combining these two solutions by letting 7, = Yn + Zn so that 


L2n} = Llyn + 2%] = Llyn] + Llen] = O(n) + v(m). 


Of course, this principle applies to any convenient way in which one can 
break up the input. Specifically, if 


Lan] = O(n) ; 


and ¢(n) can be written as 
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then one can solve r equations: 


for « from 1 to r_ solve 


Lx?) = ¥;(n) 


and sum these to find a solution to the original equation as 


25% aly 
ie 


Operator notation also shows why the special cases of Section 3.3 are 
easy to deal with. Define the operator S, by 
Syl Yn] = Yn — Y Yn-1- 


Clearly, 
Sf] =r—7-y* = 0. 


If one wants to solve the nonhomogeneous difference equation 
Ltn] = y", 


then S, can be used to reduce the nonhomogeneous equation to a homo- 
geneous equation, because 


Sq[ Elen] ] = Sq[4"] = 0. 


Let us define L. as the composition of S, with L; that is, for all sequences 


(Zn), 
Ly| %n] = Sy[ Lan] }- 


The characteristic polynomial of L. is simply the product of the charac- 
teristic polynomials of S, and L. That is, 


chy,(A) = (A—7)- cht), 


because the characteristic polynomial of 5S, is (A — 7). 

Now the reason that this special case is so special becomes clear. The 
special input functions themselves satisfy linear constant coefficient differ- 
ence equations. Specifically, letting ot be the operator that corresponds 
to d+ 1 applications of S,, then if ~(n) = p(n) y”", where p(n) is a 
polynomial of degree d, 


si1[ y(n)] = SL p(n) "] = 0. 


Hence the special nonhomogeneous case 


L[xn] = p(n) ¥ 
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reduces to the higher order homogeneous equation 
La+1[2n] = 0, 


where La+1 is Sy composed d+1 times with L. Of course, the characteristic 
polynomial of La+1 is 


Chray.(A) = (A — ayer : chy (A). 


This reduction does not change the initial values for (x,,) , but since 
the order of the recurrence has increased, more initial values are needed. 
If xo, ..., Ze—1 were the original initial conditions, then the extra initial 
conditions can be computed from these and from some values of y(n). 
Specifically, if 


k 
Lx] = In — a Ci In—-i, 
i=l 
then xp, ..., Le+q can be computed by 


k 
oe = >> cj Bei + Yk), 
w=1 


k 
Lk+d = S- Ci Leta—i + W(k +). 
i=1 
Since each new initial condition is a linear combination of k previous 
conditions and one value of w(n), each condition can be computed using 
O(k) operations, and the set of d+ 1 conditions can be computed using 
O( k(d + 1) ) operations. 

While considering the special nonhomogeneous case as a homogeneous 
equation may be psychologically simplifying, it may be computationally 
more complex. If one uses one of the standard @(k*) methods to solve a 
system of k linear algebraic equations, then converting to a (k + d+ 1)* 
order difference equation will lead to solving for k + d+ 1 coefficients and 
thus using O( (k +d+1)° ) operations[78]. In contrast, first finding the 
particular solution, which can be done in O(d?) operations, and then solving 
for the k coefficients of a homogeneous solution, which can be done in O(k?) 
operations, will take a total of O@(k? +d? ) operations, which will be fewer 
than the O( (k + d+ 1)? ) operations of the increased order homogeneous 
method. 


3.5 The Shift Operator on the Space of Sequences 


In contrast with the homogeneous recurrences in Chapter 2, when ~ is not 
the zero function, the set of solutions to (L) is not a vector space. By looking 
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at things slightly differently, the problem of solving these nonhomogeneous 
recurrences can be put into the context of linear algebra. For the moment 
we consider the infinite-dimensional complex space S of doubly infinite 
sequences 


+++,8—2, S_1, 80, $1, 82,..--, 


and define the shift operator o on S to be the function that shifts a 
sequence s € S one step to the right, 


o((Sn)) = (Sn-1)- 


This operator is defined on the vector space S. We’d like a right-shift 
operator on ST, the space of singly infinite sequences, but how is it defined? 
When we shift a singly infinite sequence one step to the right, what fills the 
vacated 0*” term? In Exercise 3.1 you show that there’s only one assignment 
that results in a linear operator: the 0‘ term must equal 0. Accordingly, 
the linear shift operator o on S* is defined as 


(3.9) a((S0, $1, $2, $3,---)) = (0, 80, $1, $2, $3,---). 


(Here we’re using the same symbol for the shift operator on both S and 
S*.) In Chapter 2 we used powers of the differentiation operator(repeated 
composition of the operator D), which are also linear. In general, integer 
powers of a linear operator are linear, and in fact, any polynomial in a 
linear operator is a linear operator. For example, the operator o” shifts 
a sequence k steps to the right, and applying the operator 2 — 30 + 40° 
to any sequence (s,,) results in a sequence whose n*” term (for n > 3) is 
28n —38n_1+48n_3. Since the left side of (L) is the n* term of the sequence 
s—c10(s) —c207(s) —-+»—cpo*(s), it is precisely L[s], where L is the linear 
operator L = I — cio — cg0? —--: — cpo®. This representation allows us to 
see another analogy between difference equations and differential equations. 
The general k*® order linear constant coefficient differential equation has 
the form 


(3.10) x(t) — cD(x(t)) — c2D*(a(t)) — --- — ee D*(a(t)) = Bt), 


where D is the differentiation operator on the vector space of infinitely 
differentiable functions on R or C (or some other convenient domain of 
definition). The left side of (3.10) can be written as Lp(a) for 


ip el GD HQ S12 Se", 


and (3.10) becomes the functional equation Lp(a) = @. Likewise, the dif- 
ference equation (L) can be written as L[s] = a, where 


w = (0,0,...,0,H(k), v(K+1),...) 
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is an element of the vector space St. Then s € S* is a solution to (L) iff 
the sequence L[s] — ~ is zero from the k'* term onwards. We can now use 
linear algebra, because 


Y= {seES* : 5, =0 foralln>k} 


is a k-dimensional subspace of St (refer to Exercise 3.3) and 


(3.11) s€S* isasolution to (L) <> L[s]-—wW € Vj. 
Theorem 3.5.1. For any choice of constants c,,c2,...,ck © C, the oper- 
ator L = I — co — cg0? —--- — cxa® is invertible on St, and the set of 


preimages L~1(V;) is a k-dimensional subspace of S*. 


Proof. To prove the invertibility of L on St we show that any choice of se- 
quence y € S* has a unique preimage s € S*. For this, define the complex- 
valued function w on N by w(n) = yn, the n*® term of the given sequence 
y, and also @ = (yx-1,---,Yo). Then L[s] = y encodes the initial value 
problem 


Sn — C18n—1 — C28n—2 — *** — Ch8n—k = W(n); SZ = a, 


which we know always has a unique solution s € S*. This proves that there 
exists a unique s € St with L[s] = y. Since the restriction of L~! to the 
k-dimensional subspace V; must be invertible, L~!(V;,) is a k-dimensional 
subspace of St. oO 


Applying this result to (3.11), we obtain 
(3.12) s € St is asolution to (L) <> s€ L7*()+L71(Y). 


Geometrically, this means that the set of solutions to (L) forms a k-dimensional 
hyperplane, a translate of a k-dimensional subspace by a single vector. Al- 
gebraically, this means that, as we discussed in Section 3.2, we can find the 
general solution to (L) by first finding a particular solution to L[s] = w and 
then adding the general solution to the homogeneous equation L[s] = 0. 
Once the general solution is known, any initial value problem can be solved 
by plugging in the initial values. The technique in Theorem 3.3.1 can be 
used to find a particular solution for equations with forcing functions of 
the form w(n) = A" p(n). However, there’s no general algorithm that gives 
a particular solution in closed form, and in fact, some classes of linear 
recurrences don’t even have a closed-form solution [47, 136]. 
We apply this theory to solve the initial value problem 


$9 = 0, 85 = 1, Sy) = 48,_-1 — 48n-2 +3"(n—- 1), 


where ¢(n) = A" p(n) for \ = 3 and p(x) = # — 1, a polynomial of degree 
d = 1. Using Theorem 3.3.1, since Ay = 2 is the only root of ch(x) = (~—2)? 
and A # \,, then 6 = 0, and there is a particular solution of the form 


Un = 3"q(n) 
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for some polynomial q(x) with deg(q) < deg(p) = 1. A particular solution 
(Up) can be constructed using the method in the theorem. For this, ch(3) = 
1 and ch’(3) = 2, which gives 


to = ch(3) =1 and ty = 2ch(3) — 3ch’(3) = —4. 


Since p; = 1, po = —1, the coefficients of q(x) satisfy the associated system 


of equations 
1 O qa\ _ 2 1 _ 9 
4 1} \qo) -1) \-9)’ 


giving q = 9 and go = —4q; —9 = —45, which means that v, = 3"(9n—45) 
is a particular solution to the recurrence. Although (v,) doesn’t satisfy 
the initial conditions, every solution to the recurrence has the form s,, = 
hyn +n, where h, = (an+ 3)2” is the general solution to the homogeneous 
recurrence. From the initial conditions, 


3.6 Formal Power Series 


We’ve already used polynomials in o, and now we want to go one step 
further and discuss formal power series in the linear operator a. The adjec- 
tive “formal” is used because we aren’t concerned with convergence of the 
power series but rather treat it as a purely formal object. In this chapter 
we concentrate on the theory of formal power series, and this will form the 
foundation for the next chapter on generating functions. The interested 
reader should consult Ivan Niven’s paper [121] for a good introduction to 
the application of formal power series to number theory and combinatorics. 


Definition 3.6.1. A formal power series in the variable x is an infinite 
sum 
y= > a,x" 
i>0 


with coefficients a; in C. Two formal power series are equal iff all corre- 
sponding coefficients are equal. 


Note that a polynomial in x is a power series whose coefficients are zero 
after some point. Because of this, a formal power series can be thought 
of as an “infinite” polynomial. This is exactly the point of view taken for 
the operations of addition and multiplication of formal power series: For 
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1 = Viso ax’ and y= er ba’, their sum is 


‘1 + 72 = Sa + b,)a* : 
i=0 


For multiplication of power series, 
we also mimic what happens with 
polynomials. The product 7 - 72 
has the constant term aobo; the 
coefficient of x in the product is 
agb, +.a1b9; the coefficient of x? is agbz + a,b; + abo, and so on. This leads 
to the following general formula for the product of two power series: 


(3.13) ne (La') (do2") = at. 


n>0 it+jg=n 


The product of power series defined 
here is called a convolution. Ex- 


ercises 3.20 to 3.25 form a series of 
exercises on convolution. 


An important point to note is that since there are only n+ 1 pairs of 
indices 7,7 that sum to n, each coefficient on the right side is a finite sum 
of products a,b; in C, and as such is a complex number. This means that the 
product 772 is a formal power series. Using these definitions for addition 
and multiplication, formal power series behave as polynomials in many 
respects, and Theorem 3.6.1 below summarizes the algebraic properties of 
the integral domain of formal power series. To get there we will take 
a detour through a proof technique called the finiteness argument, which 
explains a good deal about the way formal power series behave algebraically. 
Given any formal power series y = )> a,x", we define its partial sums just 
as in calculus: for any nonnegative integer d, the d*® partial sum is the 
polynomial yq = yy a,x’. As an example, let us use partial sums to 
investigate the computation of the coefficients in y?. From the definition of 
multiplication, the 0‘ coefficient of 7? is a2; the first coefficient is 2a9a1; 
the second is a7 +2aga; and so on. Compare this to the sequence of squares 
of the partial sums: 


2 2 
Yo a ao ’ 
a2 = ad + 2apai2 + aa”, 

2 2 2 2 2.4 
3 = a2 + 2agayxr + (a? + 2agaz)x? + 2ayagxr? + adx*, 


What happens is that the 0" coefficient of y? is the same as that of 3 
for all d; the first coefficient of y? is the same as the first coefficient of 
7; for d > 1; the second coefficient of 7? is the same as that of 7 for 
d > 2. (We will prove that this pattern continues.) This is what we refer to 
as the Finiteness Argument. It allows us to reduce any question about 
finitely many coefficients of a formal power series to a question about the 
coefficients of a well-chosen partial sum. 
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For example, the statement 
“The 17" coefficient of 7? is positive.” 


can be checked by testing the 17°" coefficient of any 13 with d > 17, and 
the statement can not be checked by looking at the 17" coefficient of 776. 
In a similar way, consider a statement such as 


“All the coefficients of y* are positive.” 


If we can show that for every d > 0 the first d coefficients of 77 are positive, 
then we could prove this statement as well. 

The same ideas apply to more complicated expressions in more than one 
formal power series. For example, for power series 7, 5, €, the n™ coefficient 
of 36 — 4ye4 + € coincides with the n™ coefficient of the polynomial 36g — 
Ayae4 + eq for any d > n. Writing this more succinctly, for w(a,y,z) = 
3y — 4xrz4 + z, the n* coefficient of (7, 6,€) equals the n™ coefficient of 
W(Ya; Oa, €a) for all d > n. 


The Finiteness Argument 
Let V(x1,..., x24) be a polynomial in the variables x1,...,X¢- 
Let y1,..., 7 be formal power series in x, and for each i let (y)q denote 


the d‘ partial sum of 7;- 
Then for each n, the n* coefficient of U(y1,...,74) coincides with the 
n*” coefficient of the polynomial V((71)a,---;(V)a) for alld >n. 


The idea behind the proof of the Finiteness Argument is the following. 
If we first consider the most basic polynomials in two variables, 


Wi (x1, £2) =2+22 and Wo(x1, £2) = 241%, 


and two power series 71,2, the conclusion of the Finiteness Argument for 
each of 1(71,72), V2(71, 72) follows from the respective definitions of ad- 
dition and multiplication for power series. Since the arithmetic operations 
inherent in the polynomial WV form a finite sequence of additions and mul- 
tiplications of pairs of formal power series, the result follows by induction 
on the number of operations used to build up to U(y1,..., 4). 

The Finiteness Argument is a useful way to see that algebraic properties 
of polynomials transfer to the same properties for formal power series. 


Theorem 3.6.1. The set of formal power series in x (with coefficients in 
R or C) forms an integral domain under the operations of addition and 
multiplication. This means that the following properties hold: 

(a) The operations of addition and multiplication satisfy the associative 
and commutative laws, and also satisfy the usual distributive law of 
multiplication over addition. 

(b) The constant polynomial 0 is the additive identity. 
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(c) Every power series has an additive inverse, obtained by negating all 
of its coefficients. 

(d) The constant polynomial 1 is the multiplicative identity. 

(e) Whenever the product 71-2 equals the zero power series, at least one 
of the factors y; must equal zero. 


Proof. We'll prove that the distributive law holds and then just say that 
all other properties can be proved in the same manner. Let y, 6, and € 
be three arbitrary formal power series in x. The distributive law says that 
y(6 + €) = yo + ye. This is an infinite string of statements asserting an 
equality between the n‘" coefficient of y(6 + €) and the n‘” coefficient of 
yo + ye. Fix any n > 0. Since polynomial multiplication distributes over 
polynomial addition, for every d > n we have the partial sum arithmetic 
yalOa + €a) = Yaa + Yaea- Therefore, the Finiteness Argument implies that 
y(6 + €) = yO + ¥e, proving that the distributive law does hold for power 
series. O 


We use the modifier finiteness because the argument reduces a question 
about the infinitely many coefficients in a formal power series to the anal- 
ogous question for its partial sums that are polynomials or finite power 
series. There’s a more sophisticated finiteness argument that involves the 
substitution of polynomials into power series. For example, if we substitute 
the polynomial x? into the power series 


(1+ 30+ 527+---) = S°(21+1)2*, 
i>0 


we would expect to get the power series 


(1+30?+5et+---)=S (Qi+1)a™. 


i> 


Substituting the polynomial «+ «? into the same power series should yield 


143(@+ 27) +5(e@4 27)? +7(a4+ a7) 4... 
beta? 41 te  Bbe hoses 


This type of substitution works because the coefficient of the x” term in 
the new power series is a finite sum of monomials. 

More precisely, consider any power series y = 7 ,..9 a;x’ and a polyno- 
mial P(x) with zero constant term. Because the constant term of P(z) is 
zero, the exponent of every term in the power P(x)’ is at least i. Thus, in 
the expansion of 


y(P(2)) = a0 + a1 P(x) + a2P(2)? +---, 
no summand a;P(z)' with i > n can contribute to the 2” term. This 
shows two things: that the x” term in y(P(x)) is a finite sum of monomials 
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(proving that 7(P(a)) is a well-defined formal power series) and that the n™™ 
term of 7(P(z)) is the same as the n“® term of the polynomial yg(P(x)) for 
each d > n. This allows the following extension of the Finiteness Argument: 


The Extended Finiteness Argument 
Let V(x1,..., x24) be a polynomial in the variables x1,...,X+¢. 
Let y1,...,%¢ be formal power series in x. 
Let Pi(x),...,P,(x) be polynomials with zero constant term. 
Then 


U(y1(P1), y2(P2), sees V(Pr)) 


is a well-defined power series whose n‘* term coincides with the 
n*) term of the polynomial 


W((q1)a(P1), (V2) a(P2),---.(%)a(P:)), for each d>n. 


In Theorem 3.6.1 the set of formal powers series was shown to be an 
integral domain, so called because it has the same algebraic properties as 
the integers. Unlike what happens in Z, the next result shows that there 
are many power series that have multiplicative inverses. 


Theorem 3.6.2. Let P(x) be a polynomial in x with zero constant term. 
Then T = So ,.9 P(x)’ is a well-defined formal power series and T is the 
multiplicative inverse of 1 — P(x) in the integral domain of formal power 
series. We denote this inverse by 1/(1 — P(a)). 


Proof. We have already proved that T is well-defined. To show that I is 
invertible, set U(a1, 72) = x1 22 and 


m=1—2; = ye P,(x) = P,(z) = P(z). 
i>0 
Then for each d > 1, the d™ partial sums of 7 and 72 are (y1)a =1—2 
and (y2)¢g=1+a2+---+ 2%. Polynomial multiplication gives 


("1)a- (v2)a =1-2°* 


and 
U((y)a(P), (y2)a(P)) =1- PO. 


Since the 0'* and d*” terms of Y((71)a(P), (y2)a(P)) are therefore 1 and 0 
respectively, from the Extended Finiteness Argument we have 


1= W(u(P),2(P)) = (1 - Pla) >) Pe, 


i>0 


as required. oO 
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Using P(a) = Xx in the last result we have the following theorem. 


Theorem 3.6.3 (The Geometric Power Series). For non-zero \ € C, 
the inverse of 1 — Ax is 


1 
av a rg, 
(3.14) ae re 


When 4 is restricted to be a real number and z satisfies |Ar| < 1, The- 
orem 3.6.3 becomes the formula for the geometric series from calculus. 
In calculus it’s a statement about a function defined within its radius of 
convergence |x| < |\|~1, and (3.14) has no meaning outside its radius of 
convergence. For us, }>,,5) A" 2" is a purely formal object that algebraically 
equals 1/(1— Az). ~ 


3.6.1 Formal differentiation 


We next use a process of formal differentiation to develop a formal power 
series for the rational function (1—Ax)~™, which is the inverse of (1—Azx)™. 
They will be used in the next chapter on generating functions. 

The formal derivative of a power series y = )7,.) a;x' is defined to 
be the formal power series 


(3.15) D(y) = :> ian’ *. 


i> 


A direct verification shows that the usual sum rule and the product rule 
also hold for formal derivatives. Using the definition of convolution, 


fee) er) =e ee) 


> n>0 i+j=n 
n n\ — 1 
=Si(n+1)x =D(>>2") =D(-—), 
n>0 n>0 


and induction gives (refer to Exercise 3.35) 


. jl! 
(3.16) Di ee 2”) jy, 
ou (1l—«x)J 
where 0! = 1 and D® is the identity operator. Letting y = S08" we 
have | , 7 
m—1)! n+m—1 
{ma =D") = (mI ( je 
(l-a)™ py n 


and 
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the power series for (1 — Ax)~™ when m is a positive integer. This can be 
called a generalized Binomial Theorem. The name makes sense because if 
we define the generalized binomial coefficient (") for any rational r 
and natural number n by 


3 


n! 


(") _ (rl) =n +1) 


then for a negative integer r = —m we have 


—m nfm+n—-1 
eo . 
and (3.17) becomes 


eon (Teme 5 (2) 


n>0 


In [121], the Generalized Binomial Theorem 


(3.18) Q+a)"=>° (") x” 


n>0 


is proved for all rational r. 

As often happens in mathematics, a good idea like formal power series can 
be used in a wide variety of areas. Formal power series with real coefficients 
are widely used in combinatorics (for example, refer to [153, 154]), and for- 
mal power series with complex coefficients have applications in physics and 
statistics. For these applications as well as for technical reasons, the defi- 
nition of convolution for power series with complex coefficients is defined 
slightly differently. For other applications, such as the theory of automata 
and formal languages (refer to [143]), formal power series are further gen- 
eralized to allow the coefficients to lie in a ring or semi-ring rather than in 
a field. 


3.6.2. An application of formal power series 


We now return to the question of inverting the operator L = I — cjo — 
e202 —-+»—cpo*, where a is the shift operator o((sn)) = (0, 80, $1,---) on 
S*, the space of infinite sequences of complex numbers. In order to make 


sense of this we need to define the formal power series [ = Sa; for 
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any choice of a; € C in such a way that it is a linear operator on St. For 
s € St, it would be natural to interpret ['(s) as an infinite formal sum of 
the sequences 


aos +.a10(s) +. ago7(s)+---. 


Writing these summand sequences as rows in an infinite array, 


ao50 , G@o51 , 4052 , 4053 , 
0 >» 4150 , G51 , G52 , 
0 ; 0 > 4250 , 4251 4, "7° 2 


each column has only finitely many non-zero terms, and so the sum of each 
column is a complex number. This is the motivation for defining I’(s) to be 
the sequence whose n‘" term is 


OVS Oasys Gy SG = (Og tip ene¢ On) * (Say Baya oySn)s 


In Exercise 3.34 you prove that the function I defined in this way is a linear 
operator on ST. 

Next suppose the operators Ty = }> ajo* and Tz = S> dja" are equal on 
S*. This means that T'\(s) = T'(s) for every s € St, which is equivalent 
to requiring that [,; — I) be the zero operator on S*. In particular this 
must be true for the sequence s* whose only non-zero term is its n‘® term, 
with s* = 1. Since the n*® term of (Cy —I2)(s*) is an — bn = 0, and this 
argument hold for each n, Ty and [2 must be equal as power series. What 
we’ve just shown is that algebraic identities involving formal power series 
are also valid identities for the linear operators they represent. For example, 
if yl, = 1 holds in the ring of formal power series, then [Tz = I holds 
for the linear operators, and Ip is the multiplicative inverse of T,. 

As an example, let’s use the shift operator to solve the first-order recur- 
rence 


(3.19) Sn41 = ASn +n? with s9 = ao. 
We first shift indices to get sy, — AS,_1 = (n — 1)? and then rewrite the 


recurrence as L[s] = w, where w = (ao,0,1,4,9,...) and L = I — do. 
Applying Theorem 3.6.2 with P(a) = Az yields 


and the solution is s = L~1(a), whose n* term is 


6. = (ep OA 1s Oe agli 
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Therefore, the solution to (3.19) is 


n—2 
Sn = QgA” + > (n = 1 = i)?! " 


1=0 


(Compare this with Theorem 3.1.2 and our analysis in Section 3.2.1.) 

In this example, finding the inverse of D was relatively easy. For higher 
order recurrences the situation is more difficult. For instance, in the second— 
order case we must consider the linear operator 


2 
DL=Il-cqjao-co, 


and Theorem 3.6.2 gives 


It = S (ce + e207) = I+ (eyo + e207) + (exo + e207)? + + +: 
i>0 


Setting L~' = >, aio* we obtain 


ag=1, 
ay=Ci, 


ag = Cy + Co 


a3 = Cy + 2¢1C2 5 
ag = C1 +38Ce+64, 


as = c) + 4e8eg + 4e1c3, 


The pattern among the a; seems to be ag = 1, a) = C1, and aj42 = Cy aj41+ 
c2a;. The techniques in the next chapter will show that the coefficients of 
L~' do always satisfy the original recurrence. (Refer to Exercise 4.8.) 


3.7 Exercises 
Ex 3.1. Let y: St — S* be defined by 
(80, $1, 2,---) = (@, 80, $1, $2...) for some aE C. 


Show that ¥ is linear iff a = 0. 
Ex 3.2. Consider the (right) shift operator on S, o((5,)) = (0, 80, $1,.--) 


(a) Verify that o is a linear transformation on S. 
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(b) Prove that the left shift operator is an inverse for a, and therefore o 
is an invertible operator on S. 


Ex 3.3. Show that S*, the space of singly infinite sequences, is infinite- 
dimensional by finding n linearly independent vectors for every n > 1. 
Verify that for each k, Vp = {s € S* : s, = 0 foralln > k} is a k- 
dimensional subspace of ST. 


Ex 3.4. Show that the Fibonacci sequence (f,,) and the shifted sequence 
(fn—1) are linearly independent. This means that every solution to the 
Fibonacci recurrence 8, = $n—1+Sn—2 can be written as afy,_1+Gfn, where 
the scalars a and (@ depend on the initial conditions. In particular, find a 
formula for the Lucas numbers (which satisfy the Fibonacci recurrence 
with Lo = 2, Ly; = 1) in the form L, = afn_1 + Bfn. Are there integer 
sequence solutions to §, = Sn—1 +82 for which a and / are not integers? 
Ex 3.5. For this exercise, define a full solution to a k‘® order homoge- 
neous difference equation to be any solution that is not a solution to any 
lower order difference equation. 

(a) Show that if s is a full solution to a k*® order homogeneous difference 
equation, then the sequences s,o(s),...,0*~!(s) are linearly inde- 
pendent. 

(b) Show that (2”) is not a full solution to s, = 48,-1—48p_2, but (n2”) 
is a full solution to the recurrence. 

(c) Show that every solution to s, = 4s,_1 — 48,2 can be written as 


8, = ajn2" + aa(n—1)2"-", 


for some constants a1, a2. 
(d) Show that if (s,) is a full solution to a k* order homogeneous dif- 
ference equation, then every solution (x,,) to the difference equation 


can be written as 
k-1 
In = s Qi Sn-i, 
i=0 


where the a,;’s are constants. 
Ex 3.6. For the polynomial ch(x) = x* — c,a2*-! — --. — ce, and any 
i = 1,...,& show that ¢ = 4 cj(—j)’A*-4 can be written as a lin- 
ear combination of ch(x), D(ch)(x),...,D' (ch)(a). 


Ex 3.7. Solve 


Sn = —38n-1 — 28n-2 + (—1)” forn>2, with so =2, 5, =—-3. 
Ex 3.8. For this problem, consider the recurrence 


8n = 58n—1 — 68n-2 +2” Nn. 
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(a) Find the general solution to the homogeneous equation. 

(b) Show that vp, = —2"(n? + 7n) is a particular solution to the recur- 
rence. 

(c) Solve the initial value problem with sp = 5, s1 = 4. 

(d) Solve the initial value problem with so = 4, 5; = 5. 


Ex 3.9. For this problem, consider the recurrence 
Sn = 58n—1 — 68n—2 + (—2)”. 
) Find a particular solution to the recurrence. 


(a 
(b) Solve the initial value problem with so = 0, 81 = —7/5. 
(c) Solve the initial value problem with sp = 1, s1 = 2. 

x 


Ex 3.10. Find the general solution to the recurrence 
Sn = 5S8n—1 — 65n_2. 

Ex 3.11. Find a particular solution to the recurrence 
Sn = DSn_1 — 68n_2 + 13”. 

Ex 3.12. Solve the initial value problem 


Sn = 58n—1 —_ 65-2 a n3” 


with so = 1 and s; = 2. 


Ex 3.13. For any fixed ao, a1, solve the initial value problem 


Sn = 68n—1 —9Sn—-2 + 2”"Nn, 


so = 0, si = 1. 
Ex 3.14. For \ 4 3, find a particular solution to 
Sn = 68n—-1 —9Sn-2 + A"N. 


Ex 3.15. Solve the initial value problem 


so=l1, $1 = 2, Sn = 28n-1 + 48n_2 85,3 + Wn), 


for each of the following input functions: 
Y(n)=(-D)"n; o(n)=n?; Y(n)=(-2)"n; Y(n) = 2”. 


Ex 3.16. Use the Finiteness Argument to show that the ring of formal 
power series has no zero divisors; that is, if 71,72 are two power series 
whose product is the zero polynomial, then at least one of 7, 72 is the zero 
polynomial. 
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Ex 3.17. Let yi and y2 be two formal power series in x. Show that 
71(y2(x)) makes sense as a formal power series in x, provided the con- 
stant term of 7 is zero. Check that the substitution of y2 = 1+ (which 
has non-zero constant term) doesn’t work. 


Ex 3.18. Let U(y1,...,7) be a polynomial in finitely many power series 
V1,---,7 and let P be a polynomial in x with zero constant term. Use the 
Extended Finiteness Argument to show that first expanding Y and then 
substituting P for x results in the same power series as first substituting 
P for x and then expanding ~. 


Ex 3.19 (The Quadratic Formula for Power Series). If a = )7;., aiz’, 
where each a; is real, show that there exists a unique 3 = >,., bir? such 


that @ satisfies (1 + 3)? = 1+ a (and we can write 1+ 6 = /1+ a). Use 
this to prove that whenever 7, = )0;..9 aiv' and 72 = )0js9 bia’ are power 
series with real coefficients and aj — 4bo > 0, then the equation 


xz? +ya+7=0 


has the two solutions in power series given by 


—y + Vi - 472 


2 


The process of convolution appeared when we defined multiplication of 
power series, and it occurs in many other contexts in mathematics. The 


next sequence of exercises (Exercises 3.20 to 3.25) explains convolution 
and gives some examples. The definitions of a group and a ring are needed 
to work most of these exercises. 


Ex 3.20. Show that the product of two formal power series can be defined 


ss aja" S- bai | = S- a Qibn—1 | 2”, 


i>0 j>0 n>0 \i>0 
where b,_; is considered to be zero for negative subscripts. 


Ex 3.21. Let G be an additive abelian group and let ¢,~ : G — R be func- 
tions that are zero for all but finitely many g € G. Define the convolution 
ox by 
(6*¥)(9) = S_ o(h)v(g—h). 
heG 


Show that 
(a) ox ¢ is well-defined (meaning that the sum above is finite); 
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(b) (6 * w)(g) = 0 for all but finitely many g € G; 
(c) @ commutes with 1; that is, @*kW = W* od. 


Ex 3.22. Let G be as in Exercise 3.21 and consider the set R of all functions 
G — R that are zero for all but finitely many elements of G. 
(a) Show that R is a commutative ring under the operations of function 
addition and convolution. 
(b) Show that the function that is 1 at the additive identity of G and is 
zero elsewhere is a multiplicative identity for this ring. 
(c) Either show that R is guaranteed to have no zero divisors or find a 
group G for which the associated R does have zero divisors. 


Ex 3.23. As usual, R[x] denotes the set of polynomials in the variable x 
with real coefficients. For any polynomial in R[z], define a corresponding 
function on N whose value at i € N is the i‘ coefficient of the polynomial. 
Use this to show that R[a] is the ring defined in Exercise 3.22 with G = N. 


Ex 3.24. Consider the set R = {6 : Z > R| ¢(n) = 0 for alln < O}. 
Define the convolution of two such functions by 


(6% ¥)(n) = D> d(é)v(n — t) 


iEN 


Show that R forms a commutative ring under the operations of usual func- 
tion addition and convolution. Identify an identity element for convolution. 
Note: This ring R is the ring of formal power series with real coefficients. 


Ex 3.25. Let f,g:R— R be two integrable functions. Define the convo- 
lution of f and g by 


+00 
(feo) =f Fuwglt-wau, 
—c&oO 

Without worrying about the convergence of the integral or the integrability 
of the resulting function, explain how this is a continuous analog of the 
convolutions defined above. Show that this serves as multiplication and 
that the set of integrable functions is a ring. If you know enough, worry 
about the convergence of the integrals. If you know what the Dirac Delta 
function is, show that it is an identity for convolution. 


We have designed the next string of exercises (Exercises 3.26 to 3.34) to 
give another perspective on the finiteness arguments used to verify the 


algebraic properties of power series. These can be seriously attempted 
only by a student who has had a course in the foundations of real analysis. 
Other students are encouraged to read through these exercises. 


Ex 3.26. Let R{a] be the set of polynomials in x with real coefficients. 
Given a non-zero polynomial in R{z], define its order ord(f) to be the 
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degree of the lowest term with a non-zero coefficient. Thus, the order of 
a polynomial is 0 unless its constant term is zero; if the constant term is 
zero, the order is 1 unless there is no linear term; and so on. Show that the 
order of f ism > 1 iff f =0 is a root of f with multiplicity m. 


Ex 3.27. Pick your favorite constant c > 1 and define a function | - | : 
R[z] — N by |0| = 0 and |f| = c~ 4) for non-zero f. Show that this 
absolute value satisfies the following familiar properties: 

(a) |f| > 0 and |f| = 0 iff f = 0 (Positive definiteness). 
(b) |f +9] < |f|+|g| (The triangle inequality). 
(c) |f —9| =|9 — f| (Symmetry). 
(d) |fg| = |f|\g| (Multiplicativeness). 


Ex 3.28. Review the construction of the real numbers R from the ratio- 
nal numbers Q using Cauchy sequences, paying careful attention to the 
properties of the absolute value that are used to make this construction 
work. Now define a Cauchy sequence of polynomials in R[a] to be a se- 
quence (f,,) of polynomials such that for every « > 0, there is some M such 
that |fn—fm| < € whenever m,n > M (where |-| is defined in the previous 
exercise). Next, define two Cauchy sequences (f;,) and (gn) of polynomials 
to be equivalent if for every € > 0 there is an M such that |fn—gm| < € for 
m,n > M. Show that this is an equivalence relation on the set of Cauchy 
sequences of polynomials. To do this, you do not need to have any idea of 
what a Cauchy sequence looks like; the only thing you need to use are the 
properties of the absolute value. 


Ex 3.29. Show that every Cauchy sequence of polynomials is equivalent 
to a unique sequence (ga) of polynomials such that deg gq = d and gq = 
ga—1 + agx® for all d > 1, in other words, a sequence such that gq is the 
partial sum of a formal power series in x. Conclude that equivalence classes 
of Cauchy sequences of polynomials are the same objects as formal power 
series. 


Ex 3.30. Mimicking the construction of R from Q, show how to define 
arithmetic operations on equivalence classes of Cauchy sequences of poly- 
nomials and prove that these are well-defined (independent of which equiv- 
alent sequences are used.) Show that these operations obey the same alge- 
braic laws as they do when applied to polynomials. (The last exercise says 
that these are really operations on formal power series, but be sure to work 
this exercise using only properties of | - |.) 


Ex 3.31. Show that the arithmetic operations defined in the last exercise 
are the same ones we defined for formal power series. 


Ex 3.32. This is a generalization of Exercise 3.27 from polynomials to 
power series. Define an absolute value on St as follows. Given a non-zero 
sequence (,,), define the order of (a,,) (which we denote by ord(a,,)) to be 
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the first n for which x, # 0. Choosing your favorite c > 1 again, let 


Show that this absolute value is positive definite, symmetric, and satisfies 
the triangle inequality. 


Ex 3.33. A space with an absolute value such as the ones above is called 
a metric space. It is called a complete metric space if every Cauchy 
sequence of elements from the space has a limit in the space. Show that S* 
is complete by showing that if we are given a sequence (x*) of sequences 
(x!) that is Cauchy in the sense that for every € > 0 there is an J with the 
property that ||’ — 2 || < € for i,j > I, then there is a sequence y such 
that limj.. 2; = y. (This limit involves another epsilon—I definition, this 
time with || - ||.) 

Ex 3.34. (This exercise requires the preceding exercises.) 

(a) Show that if Jim, fn =I, where the f, are polynomials in o and 
I’ is a power series in o, then for any sequence s € S*, we have 
that lim f,(s) exists and is equal to the sequence I'(s) as defined in 
Section 3.6.2. 

(b) Show that any formal power series in o is a linear operator on S*. 


Ex 3.35. Use induction to show that for all 7 > 0, 
j! 
ij n\ _ 
Di(So0") = (— att 
n>0 


where 0! = 1 and D® is the identity operator. 


Ex 3.36. Let D’ denote the ordinary j*" derivative with respect to x. For 
any formal power series S(2) = Doss s,x* show that for all j > 1, 


j _ i(k) = 5 kt+j . 
D (S(2)) = ane (a*) =a j "Vonage 


Ex 3.37. For rational numbers r,s, use the fact that (1+ 2)"tS = (1+ 
x)'(1+2)* to show that the binomial coefficients satisfy 


es_ > (5) (°) for all k > 0. 


i+j=k 


Also use (1 — 2?)" = (1—2z)"(1+2)" to get 


(7,) = cor (1)(,2,) fron 
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Ex 3.38. For L = 1-— 20+ 0°, show that L~* = >,.,(i+1)o'. Show that 
the inverse of 1 — 0 — 0? equals )7;.., fi+10°, where f; is the i*® Fibonacci 
number. From this calculation derive a representation of the Fibonacci 
numbers as sums of binomial coefficients. 


Ex 3.39. Show that D(y) = 0 implies that the power series ¥ is a constant, 
and use this to show that D(y1) = D(7y2) implies that 7, = y2 + for some 
constant c. Further show that the general solution to D(7) = d7,50 nz” 


‘ QAn-1 
isy=c+ nd x”. 


Ex 3.40 (Exponential power series). Show that D(y) = y has the 


solution 7 = 3750 2", the Taylor series for e”. Use this fact to show 
=" nl 
1 


that 7 = )¢,>0 —(—2)” is the multiplicative inverse of y = )/,59 4% 
20 pI =n 
Hint: Multiply by e~”. 


nm 


Ex 3.41. (a) Use the product rule for formal differentiation to show that 
for any formal power series y, e~*D(7y) — ye~* = D(e~*4). 
(b) Solve the formal equation D(y) — y = )/,>9 G2” subject to the 
=" nl 
condition that y= 1+ 50,5) anz”. 
Hint: Multiply both sides by e~*. 
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Generating Functions 


This chapter is devoted to the study 
of generating functions, which we use |tion is said to have been 
for solving difference equations. After | coined by Piers Simen 
some motivation for the term “gener- Laplace (1749-1827), but it 
ating function” is given in the first sec- 


The term generating func- 


; . formed an integral part of 
tion, the remainder of the chapter con- Abraham de Moivre’s [48] 


centrates on using generating functions 
to solve recurrences. We include exam- 
ples from combinatorics and number 
theory. 


1730 paper on linear recur- 
rences. 


4.1 Counting Strings with Some Restrictions 


To introduce generating functions we begin with a combinatorial problem 
whose solution uses generating polynomials. For any n > 0, an ordered list 
of n symbols from some alphabet is called an n-string. The only 0-string 
is the string with no elements, which is usually called the nullstring. 

For the alphabet A = {2, y, z}, there are three 1-strings, the symbols in 
the alphabet A. One way to obtain a listing of all 2-strings from A is to use 
the formal algebraic sum obtained from the product (#+y+z)-(a+y+z) 


68 4. Generating Functions 


according to the distributive law for multiplication, 


(c+y+z)-(et+ytz) = (etytzjet(etytzjyt+(etytz)z 
(4.1) = ge+yr+ ze+auytyytezytuzt+yzt zz. 


Since each 2-string appears exactly once as a summand on the right side 
of (4.1), the polynomial (2 + y + z)- (a +y+ 2) is said to generate the 
2-strings. Using the commutative law and the laws of exponents, (4.1) can 
be compressed to 


(4.2) (n+ y +z)? = 1x? + Qay + Qez + Qyz t+ ly? +12”. 


Although the compressed polynomial in (4.2) is no longer a complete listing 
of 2-strings, it does encode some important information. For instance, the 
term 2ay indicates that there are two 2-strings with one x and one y; 1z? 
says that there is one 2-string that uses only the symbol z. In general, for 
all n > 0, the polynomial P,,(z,y, z) = (a +y+2z)” generates all n-strings 
formed using the alphabet A = {x,y,z}, and the coefficient of x*y’z* in 
its compressed representation counts the number of n-strings that have 2 
x’s, 7 y’s, and k 2’s. 

When x = 1,y = 1,z = 1 is substituted into P,(a,y, z), each summand 
is 1, and accordingly, P,(1,1,1) = 3”, the total number of n-strings. The 
mile of P,(x,y,z) for other choices of the variables x,y,z can be used 
to count the number of strings with certain characteristics. For instance, 


suppose we want to calculate the four quantities gf) 3") 3 fr) 3”) , where 


each ain) equals the number of n-strings in which the number of y’s is 


congruent to i (mod 2) and the number of z’s is congruent to 7 (mod 2). 
For example, for n = 2, from (4.1) we obtain 
2 
s@) = |{xa, yy, zz}| = 3; s?) = |{za,xz}| = 2; 
2 2 
sig = {yey} = 2: si = lev, y2}l = 2. 


Since each n-string is counted once in the numbers 3), sf), 3h), 30), then 


(4.3) 3” = (14141)" = Pa(1,1,1) = 8) + o& + 3 4 3 
holds for all n > 0. When we substitute x = 1l,y = —1,z = 1 into 


P,(2,y,z), each string with an odd number of y’s is counted as negative. 


(n) and 3”), then 


Since strings with an odd number of y’s are counted in s 
(4.4)  1=(1-141)" =P,(1,-1,1) = 5 + 5% — 6) — ol) 
Similarly, 


(4.5) 1=(1+1-1)" =P,(1,1,-1) = 5% — 5 4+ 9) — oi) 
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and 
(4.6) (-1)" =(1—1-—1)" = P,(1,—1,-1) = 5) — sh — 6 + 3, 


The equations in (4.3) through (4.6) form a nonsingular system of four 


equations in four unknowns, that can be solved to obtain all of sf), 3h), 


sto Sr 


For example, consider the question of finding a formula for 3h"), calcu- 
lating the number of n-strings that have an even number of y’s and an 
even number of z’s. Actually identifying all n-strings with this property 
would be a daunting task for large n. On the other hand, summing the four 
equations (4.3) through (4.6) results in 


Jot (a1)? dei), 


and we have the exact formula 


(4.8) gi) - aol 
4 
obtained without listing all permissible strings. Checking this formula for 
n = 2, we see that xx, yy, zz is a complete listing of permissible strings and 
(3? + 2 + (—1)?)/4 does equal 3. The values of P,,(1,y, z) for y, z € {+1} 
allowed us to do this computation. 
We close this section with an indication of how recurrence techniques 


from earlier chapters can also be used to calculate 3h), To get a recurrence 


that relates gf") to some On for earlier 1 <n, we decompose each n-string 
into two substrings: its first character and the remaining string of length 
n —1. Any permissible n-string has one of three forms: Its first character 
is x and the rest is a permissible (n — 1)-string; its first character is y and 
the remaining (n — 1)-string has an odd number of y’s and an even number 
of z’s; its first character is z and the remaining (n — 1)-string has an even 
number of y’s and an odd number of z’s. Among the n-strings counted in 


3") there are an strings of the first type, gn) of the second type, and 
snd) of the third type. Therefore, 

n n-1 n-1 n-1 
(4.9) ie = aah d4 an dy age 
Similar arguments give 

n n-1 n—-1 n-1 
(4.10) Ser = an + a + a 

n n-1 n-1 n-1 
(4.11) ae = a a a - 

n n-1 n—-1 n-1 
(4.12) ae = a y+ eth + ei ., 


This approach to the problem is completed in Exercise 4.2. 
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4.2 An Overview of the Generating Function 
Technique 


We’ve just shown how the polynomial P,,(x, y, z) = (c+y+z)" can be used 
to generate all n-strings from the alphabet A = {2, y, z}. The remainder of 
this chapter will consider generating functions in one variable. For a more 
comprehensive discussion of generating functions and their applications the 
reader is referred to [169]. Also, in [152] Richard Stanley gives a survey of 
generating functions as used in combinatorics. 

We first consider the simplest case, polynomial generating functions in 
one variable. For this, recall that a polynomial in the variable x with com- 
plex coefficients can be written in the form p(#) = s9 + s;%+---+ 8,2" for 
some complex constants 59, $1,..-,8-. Complete knowledge of the polyno- 
mial is equivalent to knowledge of its coefficients, the sequence so, $1,..., Sr. 
In this way p(a) can be considered to be the generating polynomial for the 
finite sequence (so, $1,.-.,8,-), and conversely the sequence (9, $1,..-, Sr) 
is the sequence generated by the polynomial p(x). For example, the Bino- 
mial Theorem says that 


n 


(l+a)"= » (;) a, for any positive integer n 
k=0 


n! 
~ ki(n—k)! 
binations of n things taken & at a time, and the polynomial p, (x) = (1+2) 
records the sequence Gy (7),...,(”) (and so each p,,(x) can be viewed as 
the generating function for the n*® row of Pascal’s triangle). Because the 
solution of an initial value problem is an infinite sequence, its generating 
function is an infinite series rather than a polynomial unless the solution 


sequence is eventually zero. 


where the binomial coefficient (;) is the number of com- 


n 


The Generating Function of a Sequence 
The generating function for an infinite sequence (s,) is the formal 


power series 
) Snr”. 


n>0 


When the sequence is finite, say (80, 81,°*: ,8,) , its generating function 
is the polynomial 


Sot spotters + Spx". 


As in Chapter 3, a generating function is a power series, an algebraic 
object that can be formally manipulated using algebraic operations. The 
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sum and product of the generating functions S(x) = S°,,.9 Snx” and 
T(x) = 0,50 tna” are respectively 7 
(4.13) 
S(x) +7 (2) = So (sn + tn jx" and S(x y= (Sha ee sat 
n>0 n>0 j=0 


Remember that a formal power series is never evaluated at a specific value 
of x = a unless it can be shown that the series is more than purely formal 
by proving it converges on an open disk containing a. 

In this chapter we develop the generating function technique for solving 
recurrences, and the following examples demonstrate the method. 


Example 4.2.1. (The general first-order homogeneous recurrence) 
For A € C, the generating function $(z) for solutions to 5,41 = AS, satisfies 


S(x) = 80 + ay Sn 410" = Sot 2 » ASn&x” = 89 + AxS(z), 
n>0 n>0 


which can be solved for Sa) to get 
(4.14) S(x) = — 


By Theorem 3.6.3, 


S(a) = = = So sod" x . 


n>=0 


and equality of formal power series gives s, = so", as expected. It is 
worthwhile to take a minute to summarize the steps in this solution. We 
first used the recurrence to find a rational form for the generating function 
and then “expanded” this rational function into another power series. The 
solution sequence is the sequence of coefficients of this power series. 


Example 4.2.2. (The Fibonacci sequence) The Fibonacci generating 
function is the formal power series F(x) = 0,39 fnx”, which satisfies 


F(x) = fo+ fic+ d= fac” =£+ 0 (fn-1 + fn—2)2” 


n>2 n>2 


At this point a creative re-indexing can be performed to get 


£)= a+ i‘ frat + > fav’? =24+0 S- ia a >, fnx” 


n>1 n>0 n>0 n>0 
which can be rewritten as 


F(a) =a24+aF(z) + 2°F(z). 
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Collecting the terms containing F(x) and solving, we obtain 


z 
(4.15) F(x) = = 
which is more complicated than the same stage (4.14) of the last example, 
and we use the technique of partial fractions from calculus to decompose 
F(a) into the sum of two generating functions, each of which is a geometric 
power series. (The general technique of partial fractions is developed in 
Section 4.3.) 

In order to decompose the rational function on the right side of (4.15) 
into partial fractions, we find the roots of 1 — 2 — x? = 0, which are 


145 1.4/5 
2 


Nan and “jg°= 7 


and from A; "A;* = —1, 


l=¢—2? =-(,' -—2)(,° —2) = (1 — Az) (1 — gz). 
By partial fractions there exist constants A, B such that 


x A B 
4.1 i es 
(<6) ae) 1-2-2? f= Tas 


Before finding the constants A and B, we note that Theorem 3.6.3 allows 
us to write the right side of (4.16) as the formal power series 


F(a) => (4. + BX)" 
n>0 
and the uniqueness of formal power series implies 
(4.18) fn = AA" + BrA2", forall n>0. 
To find A and B return to (4.16). Since 
(1 —Arz)(1 — Agz) = 1-2-2’, 


we have 
x A(1 — Agr) + BIL Ay2) 


1-2-2? 1-2-2? 
Because these two rational expressions for F(x) have the same denomina- 
tor, the numerators must also be equal, giving the polynomial equation 


x= A(1— Agr) + B(L— AW2). 
As polynomials this implies 


0=A+B and 1=—Ad\2—- Br, 
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and (4.18) becomes Binet’s Formula 


(35) (9). 


Example 4.2.3. Consider the recurrence 


fra= 


(4.21) 89 = 2, $1 =3, Sni2 = 68n41 —9Sn- 


Its generating function is 


S(z) = Ye Snt” = 89+ 814 + S- Sapo 


n>0 n>0 
=2+32 » (65n41 —98n)a""? 
n>0 
=24+3274+ 62 a Sage”? — 9x > Sy” 
n>0 n>0 
=24+327+4+62 S- Snax” — 9x? s Sync” 
n>1 n=O 
= 2+32+ 62(S(a) — 2) —92z7S(z), 


and 
(4.22) (1 — 62 + 927)S(z) =2-—92. 
Therefore, 
g(a) = 2aoe = ae A 
1—6r+9a? (1-32)? 1-32 (1-32)? 


and we can solve to get A = 3, B = —1. Using this and (3.17), 


3 1 
1-3¢ (1—32) X ‘i de oe 


S(a) 


and 
Sn = 3" — (n+ :1)3" = (2—7n)3". 


As noted earlier, the first step in these three examples was to represent 
the generating function as a rational function. In each case its denomi- 
nator bears a strong resemblance to the characteristic polynomial of the 
recurrence, namely, it is the reciprocal polynomial 


ch®(¢) =1-— ca —-++— cya® 


of the characteristic polynomial 


ch(x) = BP ge eg a eae. 
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Before proving the general result, we consider one more example, a non- 
homogeneous equation 
(4.24) So =1, $1 =4, Sn42 = 65n41 — 95, +4(n—1). 


As in the last example, 


S(v) =14+4a+ S- Snqon™ 
n>0 
=1+4+47r+ 62 a Sn” — Qa? SF Sn” + 4a S- na”—! — 4a? S- x”, 
n>1 n>0 n>0 n>0 


At this point we want the rational representation of )>,,.)nx”—', which is 
a special case of the following formula proved in Chapter 3: 


n+m-1 
n 


Joa for any integer m > 1. 


From this, 


S(a) = 1+ de + 62(S(2) ~ 1) — 98°S(0) + Fe - 


which gives 
(4.26) (1 — 62 + 92")S(x) = 


and 

1—2— 2z? 
(1 — x)2(1 — 32) ¢ 
(Here the denominator is not the reciprocal polynomial, because the in- 
put function contributes a factor of (1 — x)? and allows the cancellation 
of 1 — 3x.) The solution is completed as above, with the partial fraction 
decomposition 


S(x) = 


1 1 1 
o> 7eae Gee laa 
yielding the solution 

Sy = 3" +(n4+1)-1=3"4n. 


It is interesting to note that because 3 is not a double root of the de- 
nominator of the rational function in this example, the solution sequence is 
actually a solution to a first-order recurrence with ch(x) = «—3. Explicitly, 
we see that (s,,) is a solution to 8,41 = 3s, +1 — 2n, since 


Sn41 — 38n = (3°t1 +241) — 318" +n) =1—-2n. 
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4.2.1 Rational representation 


We’re now ready to determine the rational form that encodes an initial 
value problem for 
(L) Sntk = C1iSntk—1 + C28n4h—2 + +++ + ChSn + Y(N) 


where w(n) is the input sequence. Let S(z) = 50,39 $n2” and F(x) = 
yoo U(n)x” be the generating functions of (s,) and ((n)) respectively. 
Then 


S(x) ch®(x) = S(x)(1 — ea — +++ — cga*) 
= So sa —¢c] ba snartt — — Ck ‘ Spart 
n>0 n>0 n>0 
= Sant =Cj x See” = —Ck ye Sn—kt 
n>0 n>1 nok 
k-1 
- (sn - c18n-1 Se i 
n>k n=0 
k-1 k-1 
-C¢q , Sn—10" — +++ — Ch_a ss Sn—k+20" — Ch-180z" +. 
n=1 n=k—2 


The first summand is 
s (Suis — €18k4n—-1 — +7 — sane =2°) una" =a" F(a), 
n>0 n>0 


which depends only on F(x) and k. The remainder of the expression for 
S(x)ch®(x) contains the information about the initial conditions, and is 
the polynomial 


50 + 512 + sox? + $p—1ak71 
— C1 Sox — cs, 2? see —¢1Sp_9a"— 
= C280x7 see = C28p—30"—! 
_ Cr_1sou"—!. 


Writing this polynomial as 
d(x) = do + dia <p tt ale dy_ 1a"! 


gives dj; = 8; — ¢,5;-1 — ++: — ¢;S9 and in matrix-vector form 
(4.27) 
1 -cy —c2 —Ck-1 Sk 
0 1 —C] —Ck—2 Sk-1 
(dy—1,---,do)" =MoSo= |° 9 Los) ~Ck-3} | Se-2 | , 
0 O 0 1 So 
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where So is the column vector of initial conditions and Mo is the above 
k x k matrix, which depends only on the coefficients of the difference equa- 
tion (L). This gives the following rational representation of the generating 
function for (L). 


The Rational Representation 
If F(x) is the generating function of (w(n)) then the generating function 
of the solution to (L) with initial values So is 


d(x) + 2* F(z) 
chR(rz) 


(4.28) S(z) = 


where d(x) is the polynomial with coefficients (dz—1,---,do)’ = MoSo. 


From this we see that the rational representation is a rational function 
(a ratio of polynomials) iff the generating function for the input function 
can be written as a rational function. 


Returning to the example in (4.24), 


F(a) = 43° (n- la" = 42 ng -4) 00" 


n>0 n>=0 n>0 
da 4 4(2a — 1) 


(l-—a)? l-aw (1-2)? 
and MySo = (—2,1)* give 


_ Ag*(Q2a—-1) (1 —3e)(1-— 2 — 227) 


as in (4.26), since ch®(x) = (1 — 3z)?. 


4.3 <A Review of Partial Fractions 


If you’ve taken calculus, you’ve probably seen the technique of partial frac- 
tions, which is used there to decompose a rational function into a sum 
of rational functions that are simpler to integrate. For recurrences whose 
generating function S() is a ratio of polynomials, the partial fraction de- 
composition allows us to write S(a) as a sum of formal power series that 
are in the form given in (4.25). The fact that a partial fraction expansion 
exists is an algebraic result that we prove here. 

Assume that p and q are polynomials with deg(p) < deg(q) and that the 
leading coefficient of g(x) equals 1. Such a polynomial is called a monic 
polynomial. From the Fundamental Theorem of Algebra (refer to Ap- 
pendix B) we know that any polynomial with complex coefficients can be 
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factored into linear polynomials, 


q(x) = (x — pr)™ ++ (@ — pe)” 

for different elements p1,...,p,; of C. The first step in obtaining a useful 
partial fraction decomposition will be to show that there exist polynomials 
Ai(x),...,Az(a) such that 


(4.29) As 2 


q(x) = (@—pi)™ (x — px)™ 


and each deg(A;) < m,;. We will show not only that such polynomials A; (2) 
exist and are unique, but our proof also gives a constructive method for 
finding them. Once the initial decomposition in (4.29) is found, each A;(x) 
is expressed as a polynomial in x — p;, say Aj(x) = ay (a — pi)™ 1 +-°-+ 
Qm—1(@ — pi) + Am (for m = m;), and then the summand A;(a)/(a — p;)™ 
is expanded as 


Aj(ax a a a 
AN pO 
(x — pi) cp, (x-pi) (x — pi) 
where Q1,Q2,...,Q@m are now elements of C. The partial fraction de- 


composition we use is found by replacing each of the summands in (4.29) 
by this finer decomposition. This is helpful, since each of the summands 
now has the form given in (4.25), and the formula can be applied to obtain 
the power series for the generating function of p(x)/q(z). 

It remains to prove that (4.29) holds. First we note that when t = 1, 
p(x) /q(a) = p(a)/(a — pr)’ already has the form in (4.29). Our proof for 
the case in which q has more than one root uses the Euclidean Algorithm 
for polynomials, which is similar to the Euclidean Algorithm for integers, 
which we discuss later, in Chapters 7 and 8. This algorithm relies on the 
fact that every non-zero element of C has an inverse, and so the Division 
Algorithm can be applied to any pair of non-zero polynomials a, b to give 
unique polynomials Q, R, for which 


a(x) = Qi(x)b(x2) + Ri(x) with deg(R,) < deg(b). 


Once the Division Algorithm has been applied once, the process can be 
continued by dividing b(x) by Ri(«#), and continuing in this way a se- 
quence of polynomials (R;(x)) is obtained. Since deg(R1), deg(R2),... is 
a strictly decreasing sequence of natural numbers, the process ends after 
finitely many steps with R,+1(x) = 0. In this way we have constructed two 
finite sequences of polynomials, Qi,...,Qn+41 and R1,...,R,z, such that 


a(x) = Qi(x)b(x) + Pilz), b(@) = Qo(z)Ri(z) + Ro(z), ..., 
Qn(z)Rn—1(2) + R,(2) ; Rey i(@) = Qn+i(z) Rn (x) +0; 


x 
i 
bo 
eo 
i 
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This sequence of equations is the Euclidean Algorithm for polynomials. 
From the last equation, we note that R,, divides R,_;. Using this informa- 
tion in the previous equation, R,, also divides R,—2, and continuing back 
we find that R,, divides both a and b, and so R, is a common divisor of 
the original polynomials a and b. (In Exercise 4.7 you show that R, is the 
gcd of a and 8, since it has the largest degree among all polynomials that 
are common divisors.) Also, we can use the equations in reverse to obtain 
polynomials B, and Bz such that 


By(«x)b(a) + Bo(x)a(x) = Rp (x). 


Let us now return to the problem at hand, justification of the existence 
of unique polynomials A;(«),...,A:(w) with deg(A;) < m; such that (4.29) 
holds. As we have already mentioned, there is nothing to prove when t = 1. 
For ¢t > 1, the polynomials b(x) = (a — p2)"? +++ (a—pz)'™* and a(x) = (a— 
pi)™ have no common factors, and so the last non-zero remainder when 
the Euclidean Algorithm is applied to a and 6 is a constant polynomial, 
R,(«) =a € C. Also, the equation 


By(x)b(x) + Bo(x)a(x) =a 


can be multiplied by a~ ais ) to obtain the two polynomials Ci(x) = 
a~'p(x) Bi (x) and C(x) = a~'p(x) B(x) such that 


(4.30) Ci (x)b(a) + C(x)a(x) = p(a), 


giving 
_Cu(2) | Ca) _ vl2) 


(w@—pi)™ — O(a) g(a)” 


This process can then be inductively continued using C(x)/b(«) to obtain 
polynomials Ag(a),...,A;¢(a) such that (4.29) holds, provided deg(C) < 
deg(b). For this, we note that by the Division Algorithm there exists a 
polynomial ¢ such that A; = C) — ta has small degree, deg(A,) < deg(a). 
Substituting Ay and p; = C + tb in (4.30), we have 


Ai (x)b(x) + pi(a)a(a) = p(a). 
Then 
deg(A,b) < deg(a) + deg(b) = deg(qg) and deg(p) < deg(q) 
and 
deg(p1) + deg(a) = deg(p1a) < max{deg(p), deg(A1b)} < deg(q) 


combine to give deg(p1) < deg(b), as required. 
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In our application, the denominator in the rational representation of the 
generating function S(x) = p(x) /q(x) has a non-zero constant term, which 
means that all of its roots p,,...,, are non-zero, and therefore each can 
be written as p; = 1/A; for some constant \;. The representation in (4.29) 
becomes 
(4.31) p(x) = = ee 4 __ A(z) , 

q(t) (1— Arw)™ (1 — Aya) 
where each \/”" has been absorbed into the old Aj. 

How do we actually find the polynomials A;(a),..., A(x)? Assume for 
the moment that the denominator q has the property that it can be factored 
into the form 

q(x) = (1—Aya)--- (1 — Aga), 


where the non-zero constants A1,...,Ax € C are different. We look at this 
case first only because its analysis is somewhat simpler. What we want to 
do is determine the constants A,,..., A, such that 

p(x) Ai Ak 


4.32 PNUD, sf lS a, . 
ee) q(x) fac" ae ee 


For this, set A = A; and multiply both sides of (4.32) by the polynomial 
1 — Ax to obtain 


p(x)(1 — Ax) Ao(1 — Ax) Ax(1 — Ax) 
PONENT TE i gh EE oe ge RNS OO 
q(x) ce 1— rAo@ oe 1-A\po ? 
where A,(1— de) 
tlt — AL) 
Se 
for alli =2,...,k. This gives 
fe Oy A 


a—1/X q(x) 


and A, has been found. 

Our method for computing the polynomials A;(a),...,A;(z) when q has 
repeated roots is only a slight modification of the one we’ve just given for 
simple roots. Setting AX = 1, A(x) = Aj(x), m = my, and multiplying 
(4.31) by (1 — Ax)”, we have 


p(x)(1 = Ax)” 


(4.33) a) = A(x) + (-A)™(@ — a) T(z), 
where 
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Since A ¢ {Xo,..., Az}, the function T(z) is defined for all complex numbers 
x in the disk 


1 
|x —a| < R with radius R=min{|> ~a| : i=2,...,t} >0. 
p(x)(1 — Ax)™ 


The function f(x) = ae) 


Appendix B) 


therefore has a Taylor series (refer to 


D'(f)(a) 


i! , 


f(z) = S > bi(x —a)', for bj = 


i>0 


which converges on |% — a| < R. (Here D is the differentiation operator 
and D*(f)(a) is the evaluation of the i*® derivative of f at x = a.) Because 
T(a) also has a Taylor series 


T(x) =) ti(x — a)’ on |r—al < R, 
i>0 
from (4.33) we have 
S> bi(z — a)’ = f(x) = A(x) + 50 (-A)™ti( - a). 


i>0 i>0 


The uniqueness of the Taylor series implies that 


A(x) = Y> be — ay = 9 POG _ gy, 
i=0 I= . 


the Taylor polynomial of degree m — 1. This result is summarized in the 
following. 


The Partial Fraction Decomposition 
If qa) = i sit — yx)" (for distinct non-zero \1,...,¢), then the 
partial fraction decomposition of any rational function p(x)/q(x) with 
deg(p) < q(x) has the form 


where for eachi=1,...,t, pea ajj(1— Ajax)" I is the Taylor polyno- 


p(a)(1 = Aa) 


mial about a; = 1/2; of degree m; —1 for fi(x) = 
q(x) 
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There are other ways to determine the constants a;;. For instance, us- 
ing the notation above we again suppose qg has the factorization q(x) = 
a(x)b(a), where 


a(x) =(1—Ayx)™, d(x) = (L— Aga)" ++ (L— Aew)™*, 


and A1,...,Az are distinct and non-zero. Setting qi(z) = q(x)/(1 — A12), 
from the form of the partial fraction decomposition we know that there 
existS Qj € C and a polynomial p; such that 

p(x) Qim pi(x) 


az) (@—any” qin)” 


Multiplying this by (1 — Ai)”, we obtain 
_ P(t) _ pi(x)(1 — 12) 


nm = a7 — 


b(x) b(x) ; 


and since (1/21) is non-zero, then 


i (32 _ pr(a)( = ‘| _ p(L/Ay) 
a1) b(L/A1) 


b(x) b(x) 
This determines aj. Also, p(1/A1) — Qimb(1/A1) = 0, implying that 
x = 1/, is a root of P(x) = p(x) — aymb(x), which means that pi (x) = 
P(ax)/(1 — A12) is a polynomial for which deg(p1) < max(deg(p), deg(’)), 
and so deg(p1) < deg(qi). So, the process can be continued with p; (x) /qi(2), 
and all the constants a;; can be found in this way. 

We end this section with two examples. First we find the partial fraction 
decomposition of 1/(1 — x)(1 — 2x)? by calculating the required Taylor 
polynomials for p = 1,2. For p = 1, we want the Taylor polynomial of 
degree 0 for fi(x) = 1/(1 — 2x)”, which is f,(1) = 1, while for p = 2 the 
Taylor polynomial of degree 1 is required for fo(x) = 1/(1 — x), which is 


fo(1/2) + f(1/2)(@ — 1/2) = 2+ 4(@ — 1/2) = 2-—2(1 — 22). 


Am = 


This gives the partial fraction decomposition 


p(x) 1 2 2 


7) =e" Gen. i=2 


Now let’s use the second method for determining a 11, @22, 21 such that 


p(x) Q11 O22 O21 


gz) 1l—-ax (1-22)? 9 1-22 


From p(x) = 1 and b(x) = (1 — 22)”, a1, = p(1)/b(1) = 1. Also, 


_ 110 = 22)" 


=4 
1-2 - 


pi(z) 
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which means that we want to find @22, a2; such that 


Ax p(a) Q22 Q21 


(1-22)? g(x) (1-22)? © 1-22’ 


where now p(x) = 4x and q(x) = (1 — 2z)?. For this, the new b(x) equals 
1, and so ag2 = p(1/2)/b(1/2) = 2 and 


pi(x) = =-2=a21, 


as found above. 

For another example, consider p(x)/q(x) for p(x) = —4 — 13a — 22? 
7x? + 824 and q(x) = 1+2—52?—23 +824 — 42°. To find the factorization 
of g(a), we note that q(1) = 0 and divide g(x) by 1 — x to get 


q(x) = (1—2x)(4a* — 4x? — 32? + 22 +1). 
The quotient is also divisible by 1 — x, and in fact, 
q(x) = (1—2)3(4a? + 4a + 1) = (1 —2)3(1 + 22)?. 


We know that 
p(x) _ _Ai(z) Ap(z) 


qa) (l1—2)%) (1+4+2z2)?’ 
where A;(a) is the Taylor polynomial of degree 2 about « = 1 for 


_ p(a)i-2)? p(x) 
Ale) =a a’ 


and A2(x) is the Taylor polynomial of degree 1 for fo(a) = 


about « = —4. Calculation gives 


Ai(2) = -24+ 2(@ — 1) + (2-1)? = -2- 211-2) + (¢-1)?, 
Ao(a) = 1—4(e-+ 5) =1-2(1 +22), 


and the decomposition is 


a ee ee ee a ee ee . 
(4.34) qa) (l—2)8 (1 — 2)? + fa (1 + 2a)? (1+ 22) 


4.4 Examples of the Generating Function 
Technique 


The generating function technique we’ve just described allows us to solve 
the linear difference equation L(s) = w when the generating function F(z) 
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for the input function is a rational function. When w is a polynomial, from 
the method in the last sections we can write F(x) as a sum of rational 
functions whose power series have the form given in (4.25). As an example, 
let us apply the generating function technique to solve 


Sn = —S8n—1 + D8n-2 + Sn-3 88n—4 + 48y, 5» 
So = (9, -43, -13, —9, -4)7, 


which has characteristic polynomial ch(x) = 2° + 2+ — 52° — a? + 8a —4, 
giving 


1 1-5 -1 8 9 8 

O 1 1 —5 —1} | —43 —7 
0 0 1 1 —5) |} -13} =] —-2 
0 0 O 1 1 —9 —13 
0 0 O 0 1 —4 —4 


By (4.28) the generating function of (s,,) is 


4— 13x — 2a? — 723 + 8x4 
L+a— 5a? — 23+ 8x4 — 425’ 


whose partial fraction decomposition is given in (4.34), and from (4.25), 


= (207) 2)» (CP) Jer 
= —(n? + 5n +3) + (n— 1)(-2)”. 


The rest of this section considers two nonlinear recurrences that can be 
solved using generating functions. 


4.4.1 The Catalan numbers 


For integer n > 0 consider a string of n + 1 elements from a set with a 
binary operation. If the operation is not associative, then the result of 


aoa, +++ An 


is ambiguous until parentheses are inserted, and we can ask how many 
different arrangements of parentheses can be inserted in this string. This 
number is called the n*? Catalan number C,,. For instance, when n = 2 
the possibilities are 

ao(a1a2) and (apaj)a2, 


giving Cy = 2, and C3 = 5 since the choices for n = 3 are 


(aoa1)(a2a3) ; ao((a1a2)az3) ; (ao(a142))a3 ; ao(a1(a2a3)) ; ((@oa1)a2)az 
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In [26], Catalan considers this sequence, and he points out that Lamé [91] 
already proved that the sequence satisfies the nonlinear recurrence 


for alln >0, Cn+1 = CoC, + CiCn_-1 +--+ + CnC, 


with initial values Co = 1, C, = 1. Some authors (among them [18]) claim 
that this sequence was originally investigated by Euler [60]. 

To see that the sequence of Catalan numbers does indeed satisfy the 
recurrence, consider any arrangement of parentheses in an (n+ 2)-string as 
defining a sequence of multiplications. The final multiplication is a product 
of a (k + 1)-string and an (n+ 1—&)-string (for some k, 0 < k < n), where 
the parentheses in the two factors can be arranged in Cy and Cy_,~ ways, 
respectively. This means that for each k > 0 there is a total of C,.Cn_x 
different arrangements of parentheses in which the first factor of the final 
multiplication is a (&k + 1)-string. Summing over allowable k = 0,1,...,n 
we obtain Cy41 = CoCn + Ci Cn_-1 +--+: + CrCo, as required. 

Let’s use generating functions to solve this recurrence. Applying the defi- 
nition of multiplication of power series as in (4.13), the generating function 
C(a) for the Catalan numbers satisfies 


(C(x))? = s (CoCr + CiCn-1 Speke CrCo)a” = x, Cn412” 5 
n>0 n>0 
which gives 
1+2C(zy =1+¢2 s Cnii2” = Co + re C,2” = C(a). 
n>0 n>1 


Setting z = C(x), then z is a power series solution to the quadratic equation 
xz? — z+1=0, and Exercise 3.19 implies that C(x) must be one of 


1ltvVJ1-—42z 
Qa , 


By the Generalized Binomial Theorem in (3.18) for r = 1/2, the power 
series of V1 — 4z is }7 30 Cy (—42)", and we obtain 


1+ Yin>do 7) (—4a)” 

—_- 3." —— 

Comparing the first coefficients of these power series, we see that the neg- 
ative sign must be chosen and 


wC(a) = 
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1/2 1 (2k 
_ (_4)\k Q2k+1 _ 
a Geel alt) 


by Exercise 4.24. 


This gives 


4.4.2 Stirling numbers of the second kind 


The second combinatorial problem we consider is associated with parti- 
tioning a finite set, and our counting argument uses partial fractions. A 
partition of the n-set S = {1,2,...,n} is a set of disjoint subsets (often 
called equivalence classes) whose union is S. For each pair of positive 
integers k,n with k < n, the Stirling number of the second kind is 


denoted by and is defined to be the number of partitions of an n-set in 


which there are k classes. For instance, considering the partitions of a 4-set 
into k = 2 equivalence classes, there are three in which both equivalence 
classes have two elements (consider the possibilities for the class containing 


1) and four that have a 1-set and a 3-set. This gives ; =f. 


First we obtain a recurrence. For this we note that the number of par- 
titions with k classes in which {n} is one of the equivalence classes is 
n—-1 
k-1 


one other element, and when n is removed what remains is a partition of an 


. In the other partitions, n is in an equivalence class with at least 


is k . such partitions, 
and n might have originally been in any of the k classes. Therefore, for all 
l<k<n, 


ass tip qeay ef 


We extend this to all natural numbers n,& by defining 


(n —1)-set in which there are & classes. There are 


n 


A! to be zero 


when k > n or k = 0 (except for {0 


! = 1). For each fixed k > 1 define the 


generating function 
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which from (4.35) becomes 


Si (x) = p> s ifs + co he = @Sp_1(x) + kaS;(a). 


n>k n>k 


This gives 3;,(2) = ee 5t-10), and inductively 


1-k 
_ x? Sp_2(x) = _ i 
oe) = eho De eae ay 


since So(z) = 1. From the theory of partial fractions we know that there 
exist a; € C such that 


1 . Qa; 
(1 —2)(1 — 2z)---(1— kz) 0 Dee 


j=l 
which means that 
k k 
S42) = a8 Sra; = 2k 0 U0) 
j=l J j=l n>0 
k 

= S- ajjrartk 

n>0j=1 

k 

= sagt? )2" 
~~ J 

n>k j=l 


For each 7 =1,...,k, aj can be calculated (refer to Exercise 4.25) to be 


(- ij" Jj h— dl, 


(G— Mk 3)! 


and this gives the formula 


k eee 
ERE 


jai ( 


igs (forall n,k >0), 
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4.5 Reversion of Generating Functions 


In this section we explore another method for recovering the elements of 
a sequence from its generating function. (Be sure to review the material 
in Appendix B.) It’s called the Fourier Transform Method and is a 
preview of the Fast Fourier Transform which we describe in detail in 
Chapter 9 (refer to Section 9.5). The basic idea is to replace the formal 
differentiation process by an appropriately selected weighted sum of some 
values of the generating function. Remember that in order to evaluate the 
generating function at any complex number x = a, the power series must 
be more than simply a formal series—it must converge at x = a. The disk 
of convergence must contain a disk around each complex number at which 
the series is to be evaluated. 

In Appendix B we discussed primitive n'" roots of unity, complex 
numbers w such that w” = 1 and wi # 1 for all 1 < j <n. For example, 
w2 = —1 is the only primitive second root of unity. For general n > 1, 


th 


is always a primitive n‘® root of unity and is often called the principal 
n> root of unity. When w is any n*? root of unity, then 


O0=w"-1=(Ww—-1)(w"14+---+0+1) 
implies 
(4.36) wh +.--+wt+1=0, providedw £1. 


The next result shows how roots of unity can be used to recover the elements 
of finite sequences. 


Theorem 4.5.1. Let (so,...,8p—1) be a finite sequence of length p > 2 
with generating polynomial S(x) = so + 815u +--+ + Sp12?~!. If w is a 
primitive p** root of unity, then s; can be computed by the formula 


-1 
1% 
=F Dw S(u"), for any j =1,...,p—1. 


n=0 
Proof. For any fixed 7 = 1,2,...,p—1 we have 
wIS(w) =sow? + sywt 4 + +++ + Sp ywP 14, 
ww 79S (uw?) =sqw 99 + 8yuy?-9F $+ Sp p29, 


2 


wa PDE (uy? 1) Saga PYF + gy DF 4 oe + 5 awd) =(p=1)"7_ 
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Adding these p— 1 equations to S(1) = s9 + $1 +--+ + Sp—1, we obtain 


p-1 p-l p-1 p-1 
w IS (w") = ay a OD g4 x wi-j)n 4, ol Be »; yyP-l—j)n 
n=0 n=0 n=0 n=0 


where the coefficient of s; in this equation is Cj = yw, When 
0<i,j <p—landi¥ Jj, the primitivity of w implies w*~4 ¥ 1, and (4.36) 
gives C;, = 0. Therefore, 


danse itt 
p ifi=y, 
and 
p-l 
Sow" S(w") = ps;, 
n=0 
proving the theorem. O 


This result can be easily extended to periodic sequences, infinite se- 
quences formed by juxtaposing copies of a fixed finite sequence, 


$0, 51,-++,5p—1, 50, 51,--+,Sp—1,---- 


(The number p is called the period of (s,,) when it’s the least integer with 
this property.) Since the first p terms of such sequences are the terms of 
the finite sequence (so, 51,...,5p—1) , using Theorem 4.5.1 its terms can be 
calculated as 


en 
s;=- :2 w I" Sw”) . 
P n=0 
Since the sequence (s,,) has been assumed to be the infinite juxtaposition 
of these p terms, we’ve proved the following result. 


Corollary 4.5.2. Let (s,) be a periodic sequence with period p and let w 
be a primitive p* root of unity. If S(x) is the generating function for the 
period 89, 81,-+-,Sp—1, then the terms are 


ae 
aK = aoe w I" S(w") for allk,j. 
n=0 


It should be noted that the technique used in these results can be extended 
to more general sequences, provided we can evaluate the generating func- 
tion of the sequence in some neighborhood of « = 0. For this, we cannot 
look at the series expansion as only a formal power series, but rather require 
it to satisfy the analytic property of convergence. 
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Let us first illustrate the technique by using it to recover sg and §}. 
Suppose $(a) converges on |z| < R. Then each of the series 


S(x) = sot sia +--+ + Spa" 4--- 


and 
S(—a) = so — sya +--+ + (-1)"s,pa"+--- 


converges for every |x| < R, as does the sum 
(4.37) -S(w) + S(—x) = 2(80 + soa” +--+ + Sant?" +--+) = 2fo(z), 


where here f2(x) is the function fo(x) = 7,39 Son22" discussed in Exer- 
cise 4.26(b). Therefore, (4.37) becomes 7 


and we’ve recovered so! Also, considering the difference S(a) — S(—x) and 
Exercise 4.26, we have 


S(x) — S(—a) = Qa(s1 + 8307 +--+ + Songia?™ +--+) = Qe fo(z); 


. S(x) — S(-« . 
eae) bane oa er ae 
the next element of the sequence. 
To generalize this to a general reversion formula, notice that (4.38) can 
be rewritten as 


when w = —1, the principal 2"? root of unity. Although this might seem 
like a complicated way to write (4.38), we will see that this type of identity 
works in general. 

As a warm-up for the general case, let w be the principal 3°¢ root of 
unity. Since 


|x| = |wa| = |w?2| for all x, 


then for x within |a| < R the generating function S(a) converges at all 
three of x and wx and w?x. Therefore, w* = 1 implies 


S(wa) = 89 + sywa + sqw?x* + sga? + sqwat + ssw2n? +--- , 


2.4 
S(w?x) = sp + syw2a2 + sqwax” + 53x? + sqw?a* + spun? +: , 
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and 


S(x) + wS(wa) + w?S(w?2) 


=S8o9+ 510+ 8927 + 832° + sax + SpWx? qeans 


+ sow + syw*a + sox? + sgwa? + sqw?a* + s5n°4--- 


+ Sow + 8ywr + Soa" + s3wa + sawa* ze Sx? sheen 


= 059 + Ors, + 38x? + Os3 + Osaxt + 355@° Sr 


the last from (4.36). This can be reworded as 
(4.39) S(x) + wS(wx) + w?S(w22) = 3a? f3(z) , 
where again from Exercise 4.26 we obtain 


fy S(x) + wS(wa) + w?S (wr) 


= $82. 
2—0 3x? 


In Exercise 4.27 you show that the pattern in (4.39) generalizes to 
(4.40) S(a) + wS(wa) +--+ +0" 1S(w™ 2) = ma” fin (2) , 
and from Exercise 4.26 we therefore obtain 


ee m—-1 m—-1 
a ie aa 
«2—0 mam x2—0 


We have proved the following Reversion Formula. 


The Reversion Formula 


If the generating function S(x) for the sequence (sy) converges in some 
neighborhood of x = 0, then for any integer m > 2, 


m—-1 
(4.41) Sm—-1 = iim mami 2, Win (Wind), 


where Wr, is the principal m» root of unity. 


Note that we could have allowed wy, to be any primitive m*” root of unity 
in this Reversion Formula. 
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As an example, let us use the Reversion Formula to find the first two 
terms of the Fibonacci sequence. In Exercise 4.26 the radius of convergence 
for the Fibonacci generating function F(x) = «/(1—a— 2?) is shown to be 
R= (V5 —1)/2. We can therefore use m = 1,2 in the Reversion Formula 
to obtain the first two terms of the sequence, 


_. F(x) + F(z) . x? 
= a | \i=e-e)Ge=e) 
and 
(a) = Fm) «. Le 
fi zat 2x a0 (l-av-—27)(1+2- 27) 


Of course, for more complicated generating functions the limits in the Re- 
version Formula might be difficult to compute. In such situations, approx- 
imations to the limit give an estimate for the elements in the sequence. 


4.5.1 Using the Fourter Transform 


For an approximation, we would like to truncate the power series and make 
it a polynomial. Of course, we don’t know the power series or its approxi- 
mating polynomial. But we do know a formula for the generating function. 
We can use this formula to calculate approximate values for the approxi- 
mating polynomial. The really clever trick here is that the inverse Fourier 
transform can be used to go from a set of values to the coefficients of a 
polynomial. With the FFT algorithm this calculation can be done quickly, 
i.e. the n coefficients of a polynomial can be computed from the n values 
of the polynomial using O(nlogn) arithmetic operations. (See Section 9.5 
for more details on the FFT.) 

Consider the polynomial p(z) = 1+ 2. Evaluating at the 4" roots of 
unity 1,7,—1, —2 gives 
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If we express p(x) as p(x) = co + C1" + Cox” + 32°, then the coefficients 
can be computed via the inverse Fourier transform as 


co = 5 (PC) + p(s) + 2(-1) + v2) 
=1, 

cx = 5 [P(L) + vl) - (-8) + 9(-1) - (-#)? + 9l-a) (899 
=7i2 b1—i + 0 bie 
=1, 

a=3 pl) -+a@ +(-1) +21) *(-1)" + 9-9) {-1)"] 
=7p i=— aq L4i 
=0, 

c3 =; p(1) + pCi) - @) + p(-1) - (#)? + v(-A) - 
=ip La + 0 14] 
=0. 


This evaluation at the 4‘ roots of unity is the 4-point Fourier transform. 
Notice that the evaluation points are i°, i', 77, i?, the powers of i taken 
in this order. Back-calculating p(x)’s coefficients from these values is the 
inverse Fourier transform. This inverse calculation can also be viewed as 
treating the values of p(x) as coefficients of another polynomial and eval- 
uating this other polynomial at the powers of —2, and then dividing these 
values by 4. These results will be the coefficients of p(x). 

If we think of the roots of unity as being arranged around the unit circle 
in the complex plane, the powers of 7 are found by going counterclockwise 
around this circle, and the powers of —7 are found by going clockwise around 
the circle. So by calculating in the counterclockwise direction, we compute 
the Fourier transform, and by calculating in the clockwise direction we 
compute the inverse Fourier transform. The factor 1/n (in this example 
1/4) is needed to normalize the result. 

The inverse Fourier transform (except for the normalization) can be cal- 
culated by an algorithm for the Fourier transform by making use of complex 
conjugation. Section 9.5 explains this in more detail. 

What does all this have to do with reversion of generating functions? 
Generally, generating functions are not polynomials, but near « = 0, one 
hopes that a generating function may be approximated by a Taylor poly- 
nomial. For example, the Fibonacci generating function 


F(2) =0+¢+27 + 20° + 324+ 5x° + 8284 .-- 
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may be approximated close to « = 0 by the polynomial 0 + wz. On the 
other hand, evaluating F(a) at the roots of unity looks “iffy” because, for 
example, the series for F'(1) does not converge. A better-behaved series can 
be obtained by scaling. For example, replacing x by .1 z in F(a) gives 


G(z) = F(.1z) = 0+ 12+ 012? + .0022° + .0003z*+.... 


Since the series for G(z) converges for z = 1 and G(z) may be “reasonably 
approximated” by the polynomial 


0 + dz + Olz? + = .0022° 


when |z| = 1, we expect that if we had the values of G(z) for z = 1,7, —1, —1, 
we would be able to use the inverse Fourier transform to calculate a poly- 
nomial of degree at most 3 whose coefficients should be reasonably close to 
the coefficients of 


Gaz) = 0 + ze + O12? 4+ .0022%. 
We can use 
lz 
ak a EN TE 
to find 
G1) = .11236, 
G(i) ~ —.0097+ .098: , 
G1) = +917, 
G(-i) ~ —.0097—.098i. 


Then using the inverse Fourier transform we find that 
Gu(z) ~ .00046 + lz + Olz? + .002z°, 
and rescaling to x we find that 
Fy(z)= 00046 + «2 + 2? + 22°, 
which is a reasonable approximation to the first four terms in the generating 
function for the Fibonacci numbers. 


The following box gives the general form of this Fourier Transform re- 
version technique. 
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The Fourier Reversion Method 


Let Sa) be the generating function for the sequence (s,,) . 
Find a scaling factor a such that G(z) = S(ax) and G(z) is reasonably 
approximated by a polynomial G;,(z) with m coefficients. 


Let w be the principal m* root of unity. 

Evaluate G(z) at 1, w, w?, ..., w™"!. 

Use the inverse Fourier transform to calculate the approximate coeffi- 
cients of G,(z). 

Re-scale to obtain approximations to the first m coefficients of the series 
for S(a) and hence approximations for the first m terms of (sp) . 


4.6 Exercises 


Ex 4.1. Using the alphabet {x,y}, find an exact formula for the number 
of n-strings with an even number of 2’s. 


Ex 4.2. For this exercise, use the notation established in Section 4.1. 
(a) Prove sf) = gi) for all n > 0. Denote the common value by tn. 
(b) Let s, = gf) and tn = gf") = gi”), Use this and equations (4.3) and 
(4.9) through (4.12) to obtain 


thoi = Qty + d3tn—1 ; to= 0, ty = Ls 


Sn4+2 = 8$n41 + Sn — 8$n-1 3 89 = 1,51 = 1,52 =3; 


Sn = Sn-1 7 Qtn—1 , 80> 1, $1 = ibe 


(c) Find exact formulas for s, and ty. 


Ex 4.3. Using the alphabet {z, y, z}, let v, be the number of n-strings in 
which the number of y’s is odd and the number of z’s is odd. Find an exact 
formula for vp. 


Ex 4.4. Use the technique of generating functions to find a formula for 
the n*® term of 


$9 = 0, 55 = 1, Sn = 58n_1 — 65y_2. 


Ex 4.5. Let f(x) = apv* +--+ a 1x2 + a9 be a nonconstant polynomial 
with complex coefficients and ag # 0. (For instance, the characteristic 
polynomial of (L) has these properties.) Let f#(x) = ag + ap_iv+--++ 
a,x2*—!+ aga", the reciprocal polynomial of f(x). Show that \ is a non-zero 
root of f(x) = 0 iff 1/A is a root of f*(x) = 0. 


Ex 4.6. Let q(x) = (1 — Ax)(1 — Az) for nonreal » € C. 
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(a) Show that for any p(x) € R[a] with deg(p) < 2n, there exists a unique 
B(x) € C[x] with deg(@) <n such that 


pz) __ 6a), _ B@) 


(g(a) (1— Aw)" (1 Xan’ 


(4.42) 


where ((x) denotes the polynomial obtained from ((x) by conjugat- 
ing all coefficients. 
(b) For any constant 6 € C, 
B B — 
4.43 =—=2 R( BA? ) a? 
(4.43) Toye tag 2M 


where ¥(3A/) denotes the real part of the complex number 3)/. 


Ex 4.7. Let a(x), b(@) € C[a] be two non-zero polynomials. Show that 
any common divisor of a(a) and b(a) divides the last non-zero remainder 
R,(«) in the Euclidean Algorithm applied to a(x) and b(a). This proves 
that a~'R,,(x) is the ged of a(x) and b(x), where a is the leading coefficient 
of Rp (a). 


Ex 4.8. As established in Chapter 3, any operator of the form L = J — 
cio — ++: —cxo* is an invertible operator on S*+ and is also invertible as a 
formal power series. We claimed that the coefficients of L~! = ch®(a)~! = 


endo ano” satisfy 
An = C1An—1 + C2dn_2g +++: +Cpan_~ for alln >k. 


Use results from generating functions to show this. 


Ex 4.9. If L(c) is the shift operator associated with the initial value prob- 
lem 


SO 0, S] il. > Sn Asn—1 = Asn_—92 5 


show that its inverse is the operator 


Ea Sp 120, 


n>0 


Ex 4.10. Verify that the partial fraction expansion of the generating func- 
tion for 


89 = 0, 85 = 1, $8, = 48,_1 — 45, + 3"(n — 1) 


19 1 71 1 9 54 


J 0=-e  Fi=m  G—3e2 T=35° 


Ex 4.11. (a) Find the generating function of s, = n?. 
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(b) Show that the generating function of the sequence s, = 17 + 2? + 
--- +n? is 


Ex 4.12. (a) If S(x) is the generating function for the sequence (sp) , 
show that S(a)/(1—<) is the generating function for the sequence of 
partial sums )7¥_9 53. 

(b) Use part (a) to prove the Fibonacci identity 


Pot Fy +++: + Fr= n+2—1. 


Ex 4.13 (L’H6pital’s Rule). Let p(x), q(x) € C[z] be polynomials. 
(a) Ifa € Cis a root of g(x), show q(x) = (a — a)qi(x) for some polyno- 
mial q(x) and that q/(a@) = qi(a). 
(b) Suppose @ is a root of p(x) and a simple root of q(x). Show that 


lim pe) im BP) 
ria qe) agi (@) 


Ex 4.14. Use the generating function method to find an exact formula for 
the n*® term of 


so = 5, sy = 13, Sn = —4sn_1+ 55y_2. 


Ex 4.15. Use the generating function method to find a formula for the n** 
term 


$9 = -1, 8) = 2, 52 = 14, Sy = Sn_1 — 48n_2 + 45y_3. 


Ax? —22—-1 
(1+ 2)(1 — 2-32?) 


Ex 4.16. Expand into a power series. 


1 i 1 
1-22  (1-—3z2)3" 
Ex 4.18. Use the generating function technique to show that 


Ex 4.17. Find the sequence with generating function 


2° if n is even , 
$n = 42"+2 ifn=1mod4, 
2°—2 ifn=3mod4 


is the solution to 
(So, $1, $2) = (1,4,4),  8n43 = 28n42 — Sn41 + 28n. 


Ex 4.19. (a) Use the technique of generating functions to solve 


so=1,51=0, 8, = 48,1 — 45n-2 +3"(n—1). 
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(b) Find all solutions to 
Sn = 48n-1 — 48n-2 +3"(n—1). 


Ex 4.20. Consider the following coupled pair of initial value problems: 


SO 1 > $1 0, to 0, ty 1 > Sn Qtn—1 + Sn—-2, th = —Sn-1 + tn-2. 


Let S(z) and T(z) denote the generating functions of (s,,) and (t,) . Find 
a system of two equations that relate S(z) and T(z), and use this to solve 
for the generating functions. 


Ex 4.21. We consider a nonhomogeneous recurrence with characteristic 
polynomial ch(x) and forcing function ¢)(n) = A"p(n) for some polynomial 
p(x) of degree m > 1. 

(a) Show that there exists a polynomial Q(x) with deg(Q) < m such 
that the generating function of every initial value problem for this 
recurrence has the form 

d(x)(1 — Ax)™t! + 2* Q(z) 

~ chR(a)\(1— Aaymtt 
for some polynomial d(x) with deg(d) < deg(ch). 

(b) Show that for any polynomial d(x) with deg(d) < deg(ch), there 
exists an initial value problem whose generating function has the 
form given in (4.44). 


(4.44) S(x) = 


Ex 4.22. Let s, = adj +ai(n)AZ +: +--+ ax¢—1(n)AZ_, be a solution to the 
kt» order recurrence Sp = C1Sn—1 + Co8n—2 +++ CKSn—k- If Ao isa simple 
eigenvalue of the recurrence, show that 
: (doa*—! + dyx*-* +--+. + dp_1)(a — Ap) 
a= ka ss 
to ch(x) 

where d; = 8; — C1 Sj-1 — +++ — Ci-180.- 
Ex 4.23 (Generalized Fibonacci sequence). For k > 2 define the 
sequence (f;,) by 


fo fi ipa fr—2 0, fr-i=l1 and fn = fn-1t+fn—2t: + t+fnek - 


(a) Show that for alla £1, 


k 
—2)+1 
ch(x) = oh(e— 2) +1 
x—1 
(b) Use the last exercise and part (a) to show that if Ao is the largest 
eigenvalue of the recurrence (later we’ll prove that it is the only posi- 
tive eigenvalue), then fn = ag + dn, where dy, has no Aj component 


and 
Ao — 1 


oleae Lo 20] 


io 
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Ex 4.24. Show that for all positive integers k, 


ae = — (7) 


where as *) is the generalized binomial coefficient given by 


(?) _ 1/2(-1/2)---(8/2 = 0) 


n nl 


Ex 4.25. For any k > 1, show that the partial fraction decomposition of 


1 - (eal Ak 
ie os a ee 
has 
pe 
 G=Nie=7 


Ex 4.26. (Refer to the information on convergence given in Appendix B.) 
Let 7(@) = 30,59 4n2” be a fixed power series with disk of convergence 
|z| < R. 7 
(a) For any strictly increasing infinite sequence (n;) of positive integers, 
consider the power series 71(2) = }0;59 a@n,2”™—"°. If a € C such that 
|a| < R, show the power series 7, (x) also converges at 7 = a. 
(b) Show that 
lim 162) +12) 


= aq. 
«—0 2 


(c) For any positive integer m use part (a) to define the complex-valued 
function f,, on the open disk |z| < R by 


fa(Z) = ys eee” : 


i>0 


Show that 
lim fm(a) = Gmyat + 
z-0 max™ 
(d) Show that (5 — 1)/2 is the radius of convergence of the Fibonacci 
generating function. 


Ex 4.27. Verify the pattern in (4.40) and thereby construct an argument 
to convince yourself that the Reversion Formula holds for general m > 2. 


Ex 4.28. Let S(x) = Toe 
—Ax 
(a) Use the Reversion Formula (4.41) to calculate the first four elements 
of the sequence with generating function S(a). 
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(b) Use the Fourier Reversion Method to calculate approximations to the 
first four elements of this sequence. 

(c) Use the partial fraction decomposition of S'(a) to verify that you have 
computed these terms correctly. 


Ex 4.29. For any sequence (a,,) , define its exponential generating 
function by 


In particular, €((—1)") = e~* and €(n!) = 1/(1 — x). Show that 
(a) For any positive integer m, E(dnim) = D™(E(an)). 
(b) For any polynomial p(x), E(p(n)an) = p(xD)(E(an)). 
(c) E(an)E(bn) = Ey et (j,) @sPn—k)- 


Ex 4.30. For the recurrence 
AQn41 = (n+ 1a, +(-1)”" with ap = 1, 
show directly that E(a,) = E((—1)")E(n!), and then use the last problem 


to show that a, = > —(-1)”. 
i=0 
Ex 4.31. A derangement of the finite set {1,2,...,n} is a permutation 
of the set in which every element is moved from its original position. Let 
dy, equal the number of derangements of the set {1,2,...,n}. For instance, 
dy 0, dz 1, d3 2. Let do =1. 
(a) Show that d,, is the number of derangements of {1,2,...,n+1} in 
which n+ 1 is in the first position and 1 is not in the last place. Use 
this idea to show that (d,) satisfies the second order recurrence 


dn+1 = n(dn Ss dn—1) : 


(b) Setting bn41 = dn4i — (n+ 1)dn, show that b,41 +b, = 0 holds for 
all n > 0. This shows that (d,,) satisfies the first-order recurrence in 
the last problem. 

(c) Find a finite sum that describes the probability that a permutation 
is a derangement. Show that this probability converges to 1/e as 
n>. 


D 


Nonnegative Difference Equations 


In this chapter we consider nonnegative difference equations. Our pro- 
totype, the Fibonacci sequence 


fn = fn-1 t+ fn—23 fo=0, fi =1, 
is a nonnegative system because all coefficients and all initial values are 
nonnegative. We’ve shown earlier (in Chapter 1) that elements of the Fi- 
bonacci sequence have the form given in Binet’s Formula 
ol 
V5 


where Ay = (1+ V5)/2 and 1 = (1 — V5)/2 are the roots of the charac- 
teristic polynomial, ch(x) = x? —x—1. Although Ap and ), are irrational, 
fn is an integer, and in fact, 


fn = Round(A? /V5) for all n > 0, 


fn (Ao — At) 


where Round(X) is the function that returns the integer nearest to X. This 
implies the asymptotic size f,, = O(AG), where Xo is the positive eigenvalue 
of the recurrence. We’d like to generalize this example and discover what 
properties hold for solutions of the generalized problem. For example, for 
integer k > 3 the generalized Fibonacci sequence is the k'® order 
recurrence 
5) = fo + Eo baa en 

with initial values 


k k k k k = 
ff =o, FP =1, AP =2, AP =o? ., f=. 
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We might guess that f& = O(XG), where A» = Ao(k) is some nonnegative 
number that depends on k. A further generalization is the homogeneous 
nonnegative equation 


(HNN) Sn = €18n-1 + €28n-2 + +++ + CKSn—k 


where the c;’s are nonnegative and all initial conditions sg,...,s,- 1 are 
nonnegative. (We also assume that at least one of these initial conditions 
is positive, or else the solution is the zero sequence.) We will investigate 
whether any additional conditions are needed to ensure the existence of 
a nonnegative number Xo such that s, = O(AG). We also consider the 
asymptotic size of solutions to nonhomogeneous nonnegative equations 


(NN) Sn = C18n—1 + C28n—2 + +++ + ChSn—n + (Nn), 


where the c;’s are nonnegative, the initial conditions are nonnegative, and 
the g(n) are functions with nonnegative values. Here we ask when s, = 
O(g(n)) or Sp = O(AG) for some Ao. As in previous chapters, much infor- 
mation about the recurrence is encoded in its characteristic polynomial. 


5.1 Nonnegative Polynomials 


The purpose of this section is to study polynomials of the form 
(5.1) p(x) = a® —cy¢*-1—..-—cy_yx—cy , with all c; > 0 and c, £0. 


These polynomials are called nonnegative, and are the characteristic poly- 
nomials of nonnegative linear recurrences, recurrences that have the form 
(NN). The study of these polynomials takes us through some classical areas 
of mathematics that are often omitted from an undergraduate education 
but are currently experiencing at least a small revival because of applica- 
tions to computing, robotics, and other areas. 


5.1.1. The dominant root 


From the Fundamental Theorem of Algebra (refer to Appendix B) we know 
that any polynomial p(x) has deg(p) complex roots when the roots are 
counted according to multiplicity. It’s difficult to say how many of these 
roots are real. One of the oldest results in this direction is Descartes’ Rule 
of Signs, which relates the number of positive roots of a polynomial with 
real coefficients to the number of sign changes among its coefficients. For us, 
a sign change occurs in the polynomial when some coefficient is positive 
and the next non-zero coefficient is negative, or vice versa. For instance, 
x? — 2x +1 has two sign changes, and °° + 27° — 521° — 7 has just one 
sign change. Descartes’ Rule bounds the number of positive roots of a real 
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polynomial by its number of sign changes. Here’s the statement of the rule 
and a proof. 


Theorem 5.1.1 (Descartes’ Rule of Signs). The number of positive 
roots (counted according to multiplicity) of a polynomial with real coeffi- 
cients is no more than the number of sign changes among its coefficients. 


Proof. We proceed by induction both on the number of sign changes n in 
the polynomial p(x) = cox*® +-+-+ cz and on the degree k. When there 
are no sign changes, p(x) has the same sign at every positive number and 
therefore has no positive root. Because of this we may assume n > 0. The 
induction hypothesis is that any polynomial with fewer than n sign changes 
or with exactly n sign changes but whose degree is less than k has at most 
n positive (real) roots. 

If cy = 0, then p(w) = xP(x) for some polynomial P(x) that also has n 
sign changes and whose degree is k — 1. The polynomial P(a) is covered 
by our induction hypothesis, and its number of positive roots is therefore 
at most n. Since p(x) = xP(x), every non-zero root of p(x) is a root of 
P(a). This gives the conclusion for such p(a) and allows us to assume that 
Ck x 0. 

Without loss of generality we may assume that p(x) has at least one 
positive root, and set Ao to be the least of its positive roots. Consider the 
derivative of p(x), 


p (x) = keox*—1 + (k -1)aa*? 


+++ + 2ep—2@ + Ce_1- 

The number of sign changes in p’(a) either equals the number for p(«) or is 
one less, depending on whether cz_; and cz have the same or opposite sign. 
By the continuity of p(x) and p'(x), between every two different consecutive 
real roots of p(x) there must be a local extremum and so at least one root 
of p'(a). Together with Exercise 5.1 this implies that p(a) can have at most 
one more positive root than p’(x) on the interval [\9,0o). The proof is 
completed by showing that p’(x) has at most n — 1 roots in this interval 
(because then the number of positive roots of p(x) is at most n, since Ao 
was chosen as its least positive root.) 

If c, and cy-1 have opposite signs, then p’(x) has n — 1 sign changes. 
Our induction hypothesis therefore limits the number of positive roots of 
p'(x) to n—1, as claimed. If cy and cy_1 have the same sign, both p(x) and 
p'(a) have n sign changes, and by induction p(x) has at most n positive 
roots. We will show that p’(x) has at least one root in the interval (0, Xo). 
This follows from a bit of calculus. First considering the case in which 
cr > 0, the values of both p(x) and p’(x) on any sufficiently small interval 
containing zero are dominated by their constant terms and therefore are 
both positive. So p(x) is increasing and positive at « = 0 but must decrease 
to p(Ao) = 0, and p(x) has a relative maximum (which is a root of p’(a)) 
on the interval (0,A9). When c, < 0, this argument can be applied to 
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q(x) = —p(x) to obtain a relative minimum for p(x). In either case, p(x) 
has a relative extremum on (0, Ao), and continuity yields a root of p'(x) in 
the interval (0, Ao) as claimed. O 


In 1637 Descartes stated this rule of signs without proof in his famous 
book La Géométrie [149]. During the next two centuries several others 
proved and refined the rule. Among these was Carl Friedrich Gauss, [70] 
who in 1828 provided the additional information that the difference between 
the number of positive roots and the number of sign changes is always an 
even number. Sturm’s Theorem (Mémoire sur la résolution des équations 
numériques, published in 1829) is another classic theorem in this vein. In its 
simplest version it yields an algorithm for counting the number of real roots 
of P(a) in any interval. More complicated versions allow the determination 
of the number of roots of a polynomial P(x) subject to sign constraints on 
another polynomial Q(a). These theorems fell into relative obscurity until 
revived by Tarski in 1940 [157] to prove an abstract result in mathematical 


logic. 
Nonnegative polynomials p(a) have exactly one sign change and so can 
have at most one positive root. On the other hand, since p(0) = —c,x is 


negative and lim p(x) = +00, the Intermediate Value Theorem implies 
=L—+Co 


that p(x) has at least one positive root. Therefore, we obtain the following 
corollary. 


Corollary 5.1.2. A nonnegative polynomial has exactly one positive root 
and it is a simple root. 


We next show that among all roots of a nonnegative polynomial, its sole 
positive root has the largest complex modulus. 


Theorem 5.1.3. If Ao is the positive root of a nonnegative polynomial 
p(x), then Ao is a dominant root, in the sense that any other root X € C 
satisfies |A| < Xo. 


Proof. If \ is any root of p(x), then 
= AR) + cod 7? + + eK, 
and the absolute value inequality yields 


[AIF = |A*] < JeraP™| + [ead] +--+ Jen 


= ¢,|d|*-1 ar @\Al*? coda Ck. 


We conclude that 
PAL) = [AIP — ex |A|*- 7 — ++ — ce <0, 


and from Exercise 5.2 this implies |A| < Ao. O 
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Do nonnegative polynomials ever have other roots of modulus 9? Not 
only is it possible to show that the answer to this question is yes, but we 
can actually find all roots whose modulus is Ap without doing much work 
at all. This is described in the next theorem. We again make use of the 

4 roots of unity (refer to Appendix B), complex numbers ¢ such that 

Cea. 
Theorem 5.1.4. Let i be the positive root of the nonnegative polynomial 
p(x) = 2* —eak-!—.-.—cy. If the index of imprimitivity g is defined 
to be the gcd of the set of indices of non-zero coefficients in p(x), then p(x) 
has exactly g roots of modulus Xo, and these are the complex numbers of 
the form oC, where ¢ is a g'® root of unity. 


Before proving this reoult we consider a few examples. For a polynomial 
of the form p(x) = 2* — 1 we have g = k, bi the roots of the polynomial 
are the g*” roots of unity. For ans x* — 1 has the four roots +1, +i 
which can be written as i°, i!, i?, 73 since 7 is a primitive fourth root of unity. 
Next, let’s look at the polynomial p2(x) = «* — 2? — 1 = (x?)? — (a?) —- 1. 
To find its positive root, we apply the quadratic formula to get 

» 175 


c= > 


2 


2 ? 


which leads to its positive root 


Since the only non-zero coefficients are cz and c4, then g = 2, the g*” roots 
of unity are +1, and the theorem correctly says that +p are the two roots 
whose modulus is maximal. For a more interesting example, consider the 
nonnegative polynomial p3(x) = x° — 32° — 7. Even though we don’t know 
the value of Ao, from the theorem we know that it has exactly three roots 
of modulus Apo, since gcd{3, 9} = 3. 


Proof of Theorem 5.1.4. If ¢ is both a second root and a third root of unity, 
¢ is in the set {—1,1} and also in the set {1,e27/%, e47/9}, which implies 
that ¢ = 1. This observation is generalized in Exercise 5.6 where you show 
that if g is the gcd of a finite set G of integers, then ¢ is a g'® root of 
unity iff ¢ is an i*” root of unity for every i € G. We apply this here with 
G = {i : c > O}, the set of indices of the negative coefficients of p(a). 
Since Xo is the dominant root of p(x), it suffices to prove that w is a root 
of p(x) with |w| = Ao iff € = w/o is an i* root of unity for all i € G. 
If ¢ is a complex number with ¢' = 1 for all i € G, then 


C-¥- p00) = AG — DS AGG? = AG — DS ciAG™* = p00) = 0, 


iEG iE€G 
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and w = Ao¢ is indeed a root of p(x). On the other hand, suppose w = ¢Xo 
is a root of p(x) with |w| = Ao. Then 


O = p(¢rAo) = CPAR — eg CFR _... — 
Y= ears, 
1€6G 


and p(Ao) = 0 implies 


Mb = cdg te ter = So GAG. 
iEG 


Subtracting these two equations gives 


(5.2) 0=S edi *(1-¢-4). 


iEG 


Notice that since each ¢~* lies on the unit circle, the real part of each 1—¢~* 
is 1—cos(—70) (where 0 is the argument of ¢) and so must be nonnegative. 
Taking the real part of the two sides of (5.2), the left side is 0 and the right 
side contains only nonnegative terms, since all a are positive (remember 
that Ao > 0). Therefore, the real part (1 — ¢~*) must equal zero for all 
i € G. Again using the fact that each ¢~* lies on the unit circle, we note 
that R(1 — ¢~*) = 0 iff ¢~* = 1, and we’ve proved that ¢ is an i*” root of 
unity for alli € G. O 

Returning to our examples, we see the fact that the index of imprimitivity 
for p2(a) and p3(x) doesn’t equal 1 allows us to write the polynomials as 
po(x) = (x7)? — (a?) — 1 and p3(x) = (x3)3 — 3(x3)? — 7, displaying a sort 
of periodicity. Because of this, we call a nonnegative polynomial periodic 
when its index of imprimitivity does not equal one, otherwise we call it 
primitive (or aperiodic). Combining Theorem 5.1.3 with Theorem 5.1.4, 
we obtain the following corollary. 


Corollary 5.1.5. Any primitive nonnegative polynomial has a unique root 
of largest modulus, usually referred to as the strictly dominant root. 


5.2 When are integer solutions rounded powers of 
an eigenvalue? 


We’ve shown that the Fibonacci sequence satisfies 
(5.3) fn = Round(Xp /V5) for all n > 0, 


where Round(X) returns the integer nearest to X. Is this a special property 
of the Fibonacci sequence, or are there more general sequences with this 
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feature? The proof of (5.3) did use properties of the Fibonacci sequence. 
For instance, the absolute value of the nondominant eigenvalue is relatively 
small when compared to Ao, and the initial deviations are small. When we 
speak of deviation here we mean the difference between the actual element 
of the sequence and the approximation. (Occasionally we may use the term 
deviation when we mean the absolute value of the deviation.) For example, 


dn = fn — XG /V5 


is the nt Fibonacci deviation. More generally, we want to approximate 


the elements of an integer recurrence sequence (s,) by numbers of the 
form adj, where Ag is the dominant eigenvalue and a is a constant that 
depends on the initial values. We’d like the absolute values of the deviations 
dyn = Sn — aAG to start small and stay small, although they need not be 
decreasing. (The results in this section are based on Capocelli and Cull 
[23].) 

Let’s first consider the special case in which the eigenvalues, Ao,..., Ax—-1, 
are all simple. From previous work we know that then s,, can be written as 


k-1 
sq = y a;ri 
i=0 


for some constants ao,...,a%-1 € C. For the approximation of sp by agAG 
the deviations satisfy 


k-1 
dn = 8, — agrXAG = ) a;,r, 
i=l 


and the familiar absolute value inequality gives 


k-1 


(5.4) ldnl < >- Jol|asl*. 
i=1 


For instance, if each nondominant eigenvalue were to satisfy |A;| < 1, then 
Sn = Round(apAj) holds whenever ey |a;| < 1/2. This result isn’t as 
good as it might seem, because in order to apply it we have to calculate 
all a1,...,Q@,%—1 ! In an effort to obtain a more effective result, let’s per- 
form a more thorough analysis of (5.4) from a graphical point of view. The 
deviations can be plotted as points (n,d,,), which can dance around (per- 
haps erratically) beneath the envelope formed by the bounding function. If 
the deviations behave very irregularly, any upper bound on them may be 
a dramatic overestimation. One possible type of irregularity is a spiking 
behavior, where the deviations might be close to zero for a while, say for 
do, d,,...,d5, but then dg is relatively large in absolute value. Such spik- 
ing could occur if A; = (1 — €)w, where € is a small positive number and 
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w is a primitive 6** root of unity. Longer period spiking is also possible. 
For instance, if w; is a primitive i‘ root of unity with A; = (1 — €1)w3 
and Ag = (1 — €2)ws, then spiking behavior with period 15 can occur if 
the period-3 spike augments the period-5 spike to give a large spike of 
period 15. It’s possible to construct sequences with even longer periods, 
because a number of short periods could join together in such a way to 
give a long period. The moral here is that one pays for the smoothness of 
the bound, since the estimating curve might not be very close to the de- 
viations. For general recurrences, the enveloping curve given by the upper 
bound in (5.4) may well be the best easy estimate on the deviations. We 
will now show an additional restriction on the characteristic polynomial 
that can give a stronger result that is often quick and easy to check. 

Notice that since Ao is a simple eigenvalue, then any solution has the 
form 

8n = arg + ai(n)AT +--+ + ag—1(m)Ag_, for alln > 0, 

for some constant a and polynomials a1(x),...,@%—1(2). 


Theorem 5.2.1. Let s, = adj + ai(n)At +++ + Gp-1(n)AR_, be an in- 
teger solution to (HNN) with dominant eigenvalue Ao. Set dn = 8n — arXG 
for alln > 0 and let M = max{|dol,...,|dx—1|}, the maximum of the ini- 
tial deviations. We further assume that f(x) = (a — 1)ch(x)/(a — Xo) is a 
nonnegative polynomial. 

(a) If M < 4, then s, = Round(adj) for all n > 0. 

(b) If f(x) is primitive, s, = Round(adj) for all sufficiently large n 


because there is an no with max{|dno|,.--,|dno+h—1|} < 1/2. 
Proof. There exist nonnegative b;,..., by such that f(x) = x* — bya2*-! — 
-++— bx, where 1 — by — bp —--- — by = 0, since A = 1 is a root of f(x). We 


rewrite this as )> 6; = > |b;| = 1 and obtain that any solution (yp) of the 
homogeneous recurrence with characteristic polynomial f(a) must satisfy 


(5.5) lyn| < max{|¥o|,|yil,---;lye—i]} for all n. 
(Refer to Exercise 5.9.) Since 
dn, = $y — AAG = ar(n)AT +--+ + ag—1(NM)AZ_4 


is a solution to the difference equation with characteristic polynomial f(z), 
(5.5) holds with y, = d,. Therefore, if the hypothesis of (a) holds, all 
deviations satisfy |d,| < M < 1/2, and s, = Round(a\j). 
In part (b), the fact that f is primitive means that its only positive root 
A = 1 is strictly dominant. Ordering the roots by 
Ag = 1 > |Aa| S |As| See > |Agal, 


we have 


k-1 k-1 
(5.6) Jan] < $2 Jas(n)|)Aal” < al” SS Jai(n)]- 
i=l i=l 
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Since each a;(n) is a polynomial whose degree is less than the multiplicity 
of A;, then deg(a;) < & and we can find positive constants N,C such that 
each |a;(n)| is less than C'n* for all n > N. Equation (5.6) gives 


ldn| < |Ai|"Ckn* for alln > N. 


Since |\,|" is exponentially decreasing to 0 while n” is only growing at 
polynomial rate, the inequality |d,| < 1/2 eventually holds, and part (a) 
yields s, = Round(adj) for sufficiently large n. Oo 


Do we have to explicitly find f(a) (and so calculate Xo) in order to apply 
this result? Fortunately, there is a relatively easy condition that ensures 
that f is nonnegative. 

Let (H;,) be the sequence defined by the first-order recurrence 


Ho = 1, Ay = Ao An-1 = w(n) 5 


where w(n) is the sequence whose first k terms are the coefficients c1,..., Ck 
and all later terms are zero. Since Xo is a root of ch(x) = x* — ca*-} — 
--+— cz, then 

FA, = No Hp-1 —Ch= Ao(AoHp-2 = Ck—-1) —Chres--= ch(Xo) = 0, 


and Theorem 3.1.2 gives 


ee 1, -)3- yang for ne hi 
0 forn>k. 
Also, it can be checked that 
(2 — Xo)(w*-1 + Hya*-? + ..-4+ Hy 1) = ch(x), 
and the polynomial f() is 


f (2) = (a = ~~ — (a _ 1\(e** + Hyak? = eee HAy-1) 


= x* — (1 — Hi)a*~* — (Hy — Ha)a*~? —-.- — (Ag_2 — Hp—-1)@ — Hy-1. 
The nonnegativity of f(x) is therefore equivalent to 


(5.8) 1>H, > H.>---> Hy_1>0, 


where the condition H;,_; > 0 is free because 
0 = Ay = A0Hp-1— ce = and Ay-1 = cr/Xo > 0. 


It might seem that testing Hy = Ao — c, < 1 requires knowledge of Ao. But 
recall (refer to Exercise 5.2) that we have the characterization of Ap given 
by 

for positive z, ch(#z)>0 =} «>, 
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a condition that is practical and quick to check. (In Section 5.4 we discuss 
Horner’s Method, in which polynomial evaluation is performed in at most 
k multiplications.) When the strict inequality ch(c; + 1) > 0 holds, the 
coefficient of x*—! in f(x) must be positive, and so f(z) is definitely prim- 
itive. As far as the other conditions in (5.8), from (5.7) we know that for 
aln<k-—1, 


n+1 
Hn = AGT = dG — So AGT = AGo — C1) — citi An, 


1=2 
which gives 


n 


Ay, — An 41 = AG (C1 +1- Ao) 1 > (epg as e)Ag : 


i=l 
Therefore, H,;, > Hy+1 is implied by the conditions 


qt1>Ao and Ca41 > Cag 2s Pela, 


which are easy-to-check restrictions on the coefficients of the original char- 
acteristic polynomial ch(a). These observations are collected in the follow- 
ing theorem. 


Theorem 5.2.2 (The Rounding Theorem). Let Ao be the dominant 
eigenvalue of a homogeneous nonnegative recurrence (HNN) whose coeffi- 
cients satisfy 


(5.9) Ck-1 >: >a. and = ch(q, +1) >0. 


Suppose (S») is a integer solution of (HNN), then: 
(a) If max{|do|,..-,|dx—1]} < 1/2, then sp, = Round(adj) for all n > 0. 
(b) If the strict inequality ch(c;+1) > 0 holds, then s,, = Round(adg) for 
alln > no, where no is such that max{|dno|,---,|dno+k—1|} < 1/2. 


5.2.1 Using the Rounding Theorem 


As we’ve seen before, the Fibonacci recurrence can be generalized to a kt 
order recurrence for any k > 3 by 


fo 0, fi 1, fo 2, ee) fia =", 
fn = fn-1 t+ fn-a ++++ + fn—k 


which has 
ch(ce; +1) = ch(2) = 2* — a*-+ —-..-2-1= 2" —-(2*-1)=1>0 


and cj+1 > cj, since each c; is 1. Hence the Rounding Theorem ensures the 
existence of some a and no > 0 such that f, = Round(aAg) for n > no, 


5.2 When are integer solutions rounded powers of an eigenvalue? 111 


where the values of a and no depend on the initial conditions. We can 
work backwards from the initial values and obtain an equivalent set of 
initial conditions, f-(k—2) = J—th-a) oe f-1 fo 0 and ti = 1, 
(This translation doesn’t change the value of either a or mo and allows an 
easier computation of a and the deviations.) From Exercise 4.23, 


n= 
Nol(k + DAo — 2k)” 


Q=> 


and the corresponding deviations are 


d_(%—2) =0- on 5 
d_(%—3) =0- ad, *-9) 3 


dj =0-a, 


dy =1- aro . 
Notice that max{|d_(%—2)|,|d—(«—3)|,---+|dol, |di|} = max{|do], |di|} be- 


cause ch(1) = —(k — 1) implies Ap > 1. Also, dj is positive, because other- 
wise d,, is always negative and so has a Aj component. If we can show that 
both a < 1/2 and 1— ado < 1/2, then we can take nop = —(k — 2), and the 
generalized Fibonacci numbers can be calculated by f, = Round(a Aj) for 
all n > —(k— 2). 

To show that 1 — aro < 1/2, it suffices to have aw >1 

(k+1)Ao — 2k 

From Exercise 5.4, the denominator is positive, and the requirement can 
be written as 0 > (k — 1)(Ao — 2), which is true because 2 > Xo. 

To show that 1/2 > a, we want 


LS prea 
2~ Nol(k+1)Ao — 2k] 


which can be rewritten as 2 > (K+ 1)Ao(2 — Ao) (since the denominator is 
positive). Using 2— A» = AD * this is equivalent to the inequality aXe > 
k+1. For k = 2, this reduces to \y > 3/2, which is easy to verify. For k > 2, 
we use Aj! = AG—?-+---+1+54 to get 2AG7* = 2(AQ 7 +--+ 54) > 2(k—-1) 
using the fact that Ag > 1. Finally, 2(k —1) >k+1ifk>3,and1/2>a 
is established. For another example, let’s consider 


(5.10) Sn = 28n—1 + 28n-2 + 38n_3 


with ch(x) = x? — 2x? — 2x — 3, a nonnegative polynomial with dominant 
root Aj = 3. Here co = 2 > cy and cy + 1 = 3 = Ao. Because c, + 1 equals 
the dominant root, eventual roundability is not ensured but depends on 
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the deviations, which in turn are specified by the initial conditions. Next 
we look at this recurrence under various choices of initial conditions, using 
the fact that a = +5 (s0 + 8; + 59). (Refer to Exercise 5.10.) 
For so = 1, 51 = 3,52 = 9, then a = 1 and the deviations are: 

dp = S89 —@ =1-1=0, 

dy = 8, —-aAj9 = 3-3=0, 

dp = 8s. -atg =9-9=0, 
and the Rounding Theorem implies 

Sn = Round(aAj) = Round(3”) = 3”. 

(This sequence could have been obtained directly without using the Round- 
ing Theorem.) For the initial conditions so = 0,1 = 0, s2 = 1, roundability 


is not obvious without the theorem. The coefficient is a = 1/13 with devi- 
ations 


dp = 0-1/13= -1/18, 

d, = 0- 3/13 = -3/138, 

dg =1-9/13= 4/18. 
Since the absolute values of the initial deviations are bounded by 1/2, then 
Sn = Round(7\3) for all n > 0. Our final choice of initial conditions for 


the recurrence (5.10) is s9 = 0,5, = 3,52 = 9. Here, a = 12/13 and the 
deviations are 


12 12 
d — —_—-— — ee 
0= 0-33 13” 
9 3 
d — —-— -— a 
= aye 13’ 
9 9 
d => = — 2 ——" a 
as ai BB 


In this case, the initial deviations are not all less than 1/2 in absolute value 
and neither immediate rounding nor eventual rounding is promised by the 
Rounding Theorem since c; + 1 = Ap. It is easy to calculate that 


12 4 12 
dg ets a eo ~~ 73” 
a 3 
da = 715-753 ~ 73° 
12 9 
ds = 225-— «3? = — 
p= eet aa Ye Ga 


and in Exercise 5.11 you verify that this pattern continues, that is, the 
sequence of deviations is periodic with period 3. Because the deviations do 
not decrease below 1/2, even eventual rounding doesn’t occur. 
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5.8.1 Estimation of the dominant root 


In Section 5.5 we will show that for nonnegative difference equations, 5s, = 
O(AG) often holds. In fact, when the nonnegative equation is primitive, the 
dominant eigenvalue Xo is simple and limp—oo ($n/AG) = @ holds for some 
constant a. Because of this, we want to estimate the dominant root of the 
recurrence in order to find the long-term behavior of solutions. 


Lemma 5.3.1. The root \y) depends on cy +c2+-++:+ cx in the following 
way: 


(1) ifer +---+ cp =1, then A = 1; 
(2) ifey +--+ +en, >1, thenl <r <cr+-++ 4+ ch; 
(3) tfer +--+ tem <1, thence, +--+ +R <0 <1. 


Proof. We observe that p(1) = 1—S, where S = cy; + ---+ cg. Since Xo is 
the only positive root of p(x), then Ap = 1 iff S = 1, which proves (1) and 
allows us to assume Ao # 1. Recall that 


Ao >1l <= 0O>pil)=1-S — S>1, 


the required relative ordering of Ap and 1. Further, 


p(S) = 8* —eS*1 —..--q_iS—e@ 
=(S—¢)S**—@78? =: «-—qaS —% 
= (eg +++ +e) 8"! — @S*-? = +s — GS — 
=o" 7(9 —1).pas (9? —1) + pee SS" —1) 
age GY Y =A). 


and the facts that each c; is nonnegative and cz > 0 imply 
S>1 <> p(S)>0 — S>do, 


as required. oO 


This lemma gives bounds on the dominant root, positioning it between 1 
and c; + ---+ cx. This will be a useful first approximation for Newton’s 
method in Section 5.4. 


5.8.2. Estimation of the second root 


While estimates of the dominant root of the recurrence are used to find 
the asymptotic behavior, bounds on the second eigenvalue tell how quickly 
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and in what sense the solution approaches the long-term behavior. When 
the nonnegative equation is primitive, there exists a polynomial G(x) such 
that 

Ion — a8] < B(n)[dxI", 


where A; is the maximum of the nondominant eigenvalues. From this we 
see that when |\;| < 1, there exist positive constants 3, M with M < 1 
such that 


(5.11) |S, — aAG| < BM", 


and (s,) converges to aAj in the sense of absolute error. On the other 
hand, when |A;| > 1 there exist 6 and M with 1 < M < Xo such that 


Sn Mvn 
(5.12) Fe al < a(x) ; 
and (8,,) is said to converge to aXg in the sense of relative error. 

In Appendix D we describe an algorithm for counting the number of roots 
of a polynomial within the (complex) unit circle |z| < 1. If the algorithm 
finds that the k** degree characteristic polynomial has at least k — 1 roots 
inside the unit circle, then |A;| < 1, and the above analysis gives absolute 
error convergence of (s,,) to aAj. On the other hand, if it’s found that there 
are fewer than k—1 roots within the circle, only relative error convergence is 
obtained from this argument. The method in Appendix D does not actually 
compute an M with Ao > M > |\,|. We might instead take the nonnegative 
polynomial p(a) and its positive root Xo and find the least M for which 
(a — M)p(a)/(a — Ao) is a nonnegative polynomial. Unfortunately, this 
method may only yield M = Xo. Next we prove some upper bounds on the 


nondominant roots of a nonnegative polynomial. 
Theorem 5.3.2. Consider a polynomial p(x) = a* — eyx*-! — 
where all c; are strictly positive (and so p(x) is primitive.) Then 


“++ — Ck, 


is an upper bound on |A4|. 


Proof. Since the conclusion follows if M > Aj, we may assume M < Xo. 


Multiplying p(x) = x* — cy2*-1 — .-- — cy_1 4 — cy by x — M gives 


okt — (ce, + M)a* + (Mey — co)a*®-1 +--+ + (Meg_1 — ce) +ogM, 


a polynomial that we call g(x). Since any root of p(x) is also a root of q(x), 
then q(A1) = 0, which means that 


a? + (Mc — en fee + (Meg_-1 — cr)a t+ onM = (c+ M)r* . 
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The definition of 17 implies that all coefficients on the left side are positive, 
and taking absolute values gives 


|Ai|**! + (Mey — c2)|Aa|* +--+ eM > (cr + M)/Aal*, 


with g(|Ai|) = (|A1]— 4) p(|A1|) = 0. Because p(x) is a nonnegative polyno- 
mial, its values change from negative to positive at « = Xo, and primitivity 
implies that Ap is strictly dominant. This means that p(|Ai|) < 0, and 
q(|A1|) > 0 gives |Ai] < M. O 


For example, if c. > cg > cg > ++: > ce > 0, then |Ai| < M < 1, and 
(5.11) says that any nonnegative solution (s,,) has limiting behavior in the 
absolute error sense, that is, limp—oo |S, — aAj| = 0. 

If any of c),...,Ck—1 are zero, then Theorem 5.3.2 cannot be used to get 
an upper bound on A;. Another result is the following. 


Theorem 5.3.3. Let p(x) = x* — qa*-} — +++ = cy be a nonnegative 
polynomial with positive root Xo. If Hy = 5 — aA, —---—« for all 
i=1,...,k (as used earlier in (5.7)), then the maximum of the absolute 


values of the roots of p(a) excluding Ao satisfies 


Hy Hi. Hy 
(5.13) |Ai] < max{ Fh, 2s i I. 


Hy’ Hy’ "’ Hy—2 


Proof. (In Exercise 5.14 you show that each H; is positive.) We may assume 
that the maximum in (5.13) is strictly less than Ao, since otherwise the con- 
clusion holds. The roots of the polynomial p(a)/(x — Ao) are A1,.--, Ax—1; 
and 


Hb 7 
a so Ag ewe yg 
x— Xo 


p(x) 


x — Xo 


Also, for any positive w the polynomial g(x) = (a — w) has only one 


positive root and 


g(a) = 2* — (w— Ay)a*-} — (Hw — He) a*-? 
So = (ot = Be = wi. 


Restricting w to 


Ay Ay} 
w > max{ Hh, =,..., ie 
A, Ay_2 
only the leading coefficient of g(x) is positive, and the coefficient of «*~! 
is strictly negative. Therefore, g(x)is a primitive polynomial with strictly 
dominant root w. Since w can be taken arbitrarily close to the maximal 
element of (5.13), |A1| must be less than or equal to this maximum. O 
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Example 5.3.1. Consider the generalized Fibonacci polynomial «* — 2*~1! — 
--»— 1, whose dominant root satisfies Ag < 2. (Refer to Exercise 5.4.) To 
apply the last theorem we need to bound the various H’s. It is true that 
1> HH, > Hy >--- > Hg_1, because 1 > Hy = Xo — 1 (since 2 > Ag), and 
Hy > Hy41 is equivalent to X§ > Apt? — Aj. Finally, Hy = Ao—-1 > Ao— FR = 
Heth because 1 > Hj. So for the generalized Fibonacci polynomial, the 
maximum in (5.13) occurs at Hy, and the result gives |Ai] < Ao — 1, a 
slightly better bound than |Ai| < 1. 


Note that in general, Hits = 9 - st holds, and this shows: 
2 C2 Ck-1 
Corollary 5.3.4. |\,| < Ao — min {a1 er aa} : 
a] Ay Ap-2 


We can also write the result in yet another form. For this, set 
k . 
Gi = Ss cy 2 = CAG H+ + ce—-1d0 + Ck 
j=i 


where 0 = p(Ao) = mee ee — gi41. Therefore, 


| ie ay RO. ae 
H, = 24 and SH = 82 to 
An A; An Git Git 


and Theorem 5.3.3 can be rewritten as: 


Corollary 5.3.5. |\i| < Ao max { $e ae aaa . 


rE? go? 93°? GR=1 


5.4 Calculation of the Roots 


As we said earlier, we want an efficient algorithm for computing the dom- 
inant root to desired accuracy because the dominant root can be used to 
find the long-term behavior of solutions. Beginning with any interval [L, U] 
that contains Ao, another interval approximation of Ag can be obtained by 
computing p(M) for M = (U + L)/2. (Refer to Figure 5.1.) If p(M) = 0, 
then M = Ao and we’ve located the root. If p(M) > 0, we replace U by M 
and otherwise replace L by M. This method, usually called the Bisection 
Method, is guaranteed (up to round-off error) to produce a sequence of 
decreasing intervals that always contain Ag. Since the interval is halved at 
each step, repeating the bisection method for n steps produces an inter- 
val of length (U — L)/2” that is known to contain Xo. In order to apply 
the Bisection Method we need an initial interval that contains the root. 
For a nonnegative polynomial, the single positive root is between 1 and 
ci +-+++ cx (recall Lemma 5.3.1), and so good starting values for U and 
L are immediately available. 
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FIGURE 5.1. A nonnegative polynomial with the U and L to be used in calcu- 
lating the root 4. 


Bisection Method 
The following code finds an interval of width E that contains the root 
of the polynomial p(#) when started with an interval [L,U], where p(L) 
and p(U) have different signs. 


ID :=(U+L)/2 
:= p(MID) 
* C' > 0 THEN 


ELSE 


The decision rule used in our procedure for the Bisection Method requires 
polynomial evaluation. If direct substitution is used, every evaluation of an 
arbitrary polynomial of p(x) = cox? +c, 27! +---+c¢q4 of degree d at some 
x = a generically requires O(d?) multiplications. The following code, which 
is usually called Horner’s Method, uses only d multiplications. 
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Horner’s Method 
A polynomial of the form p(x) = cox? + cya?! +--+ + cq can be evalu- 
ated at x = a by the following code which uses d multiplications and d 
additions. 


Note that when the polynomial p(x) is monic (namely, co = 1) the terms 
generated by Horner’s Method in the evaluation of p(x) at Ao are exactly 
the H; defined earlier in (5.7). Also, 


p(x) = (a — Age + Hyxt? + How? 3 +. t Ha_1), 


which is why Horner’s method is sometimes called synthetic division. In 
1964 Pan [123] showed that Horner’s Method uses the fewest number of 
operations to evaluate a polynomial at an arbitrary value when no precom- 
puting using the coefficients is allowed. This does not preclude a method 
for evaluating a polynomial at m values using fewer than md operations. 
(Also refer to Exercise 38 in Section 4.6.4 in Knuth [88].) 

Because a polynomial of degree d can be evaluated using Horner’s Method 
in d multiplications and d additions, the Bisection Method will compute 
the root to an accuracy of (U — L)/2” in O(nd) steps. Since multiplications 
often take much more time than additions, the computing time is usually 
given as only the number of multiplications. Therefore, if FE is the desig- 
nated error bound for the root and Ep the initial error bound, the root can 
be found within error E using dlog( E/E) multiplications. 

While the Bisection Method works reasonably well, there is another 
method, Newton’s Method (sometimes called the Newton—Raphson 
method), which usually locates a root more quickly. Again consider the 
graph of a nonnegative polynomial, but this time include the tangent to 
the curve y = p(x) at x = U and extrapolate the tangent until it intersects 
the x-axis at some x-value, which we name Uj. (Refer to Figure 5.2.) This 
can be done because (refer to Exercise 5.2) for nonnegative polynomials 
the tangent line at x = \ always has a positive slope when > Ao. This 
also guarantees that U; < U. (From the graph we see that U; is greater 
than the positive root of p(x), which means that it lies between Ap and 
U.) This construction can be repeated using U; in place of U, giving the 
general iteration 


P(Un) 


5.14 On41 = Un - 
(5.14) +1 WU) 
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because the value of the derivative p'(U;,,) equals the slope of the tangent 

P(Un) — 0 
, Un = Un41 
the iterative procedure known as Newton’s Method. Although there is little 
chance that we can write a closed-form solution to this recurrence, we would 
like to know whether it converges to Ao as well as to have an idea of the 
rate of convergence. 


. The formula in (5.14) is a nonlinear recurrence and defines 


FIGURE 5.2. A nonnegative polynomial with two tangent lines indicated. The 
tangents cross the axis at the successive Newton approximations to the root of 
the polynomial. 


The following is a standard theorem for Newton’s method applied to a 
general polynomial (for a proof refer to [20, 155]). 


Theorem 5.4.1 (Newton’s Method). For any polynomial and any sim- 
ple real root R there is a small interval around R such that if Newton’s 
method is started at any point within this interval, the approximations found 
by Newton’s method will stay in the interval and converge to R. Further, 
if the error E; is the distance from R at the i** step of Newton’s method, 
then E;41 = O(E?)—this is called quadratic convergence. 


There are several practical problems with this theorem. One is that it 
only states the existence of the interval and it does not tell how to find it. 
Also, it only promises rapid asymptotic convergence. When the polynomial 
is nonnegative, the graph in Figure 5.2 suggests that Newton’s Method 
might yield a sequence that is rapidly decreasing to Ao. 
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Recall that we plan to initialize Newton’s Method with either Up = 1 or 
Up = 1 +: +++ cx, according to whether p(1) is positive or negative. We’ve 
shown that the constructed sequence (U,,) is decreasing, provided each U,, 
is greater than Ao, that is, provided each error FE, = Uy, — Xo is positive. 
The sequence of errors has the form 

P(Un) Enp'(Un) — pP(Un) 
(5.15) En+i = Un4i — A0 = = —— an 

~ p(Un) p'(Un) 
and expanding each of p(x) and p’(a) into its Taylor series about Apo (refer 
to Appendix B) gives 


ph’) ; (41) 
= we a Oo) (w—Xo)' andalso p'(x) = ‘> BYE 0) i ee 


| i! 
i=l i=0 


where p“)(a) is the i*® derivative of p(x). This means that 


d-1 (i41) dvi) 
E,p' (Un) — p(Un) = By Oo) a, eye 
i=0 i=1 
a. 
8 esr - 5 )0 Ao) eis pO(Ao) E 


and 


5 §= 1p Qo) 5 
En+i = d, a on Ens 
ware Uy, = Ao+ En. For sufficiently small E,,, p'(Un) © p'(Ao) and En41 & 
Po) 

oF (Xo ) ER. 
must have the same sign. Therefore, Newton’s method does yield a sequence 
that is monotonically decreasing to Ao, provided the initial value is greater 
than ro: 

If Newton’s method is applied to functions that are not nonnegative 
polynomials, then various types of odd behavior are possible (refer to [142], 
[160]). But even for nonnegative polynomials, Newton’s method is only 
guaranteed to converge to the positive root when the initial value is larger 
than Ag. For example, consider the nonnegative polynomial f(x) = 73-52, 
which has the three distinct real roots: =4/ 5, 0, and Ay = /5. Newton’s 
formula for this polynomial can be written as 


and since both p™ (Xo) and p’(U,) are positive, E, and Ey+1 


ae? = 5)  a° 


N = S| — Oa 
hs 3a? —5 3a2 — 5” 


where V5 < N(x) < x holds for > V5. When the iteration is started 
at some x > V5, the iteration will monotonically decrease to V5. But for 
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instance if the iteration were started at = 1, then 
N(1)=-1 and N(-1)=1, 


and the Newton iteration oscillates with period 2. Such values of period 2 
satisfy = N)(ax), the two-fold iteration of N, and so they must be zeros 
of the polynomial 


g(a? — 1)(a? — 5)(40* — 15a” + 25), 


where x = 0 and x = +V5 are fixed points (have period 1). As we saw 
above, the points = +1 are points of period 2, and this polynomial has 
four other (nonreal) roots, which are also points of period 2. 

It is straightforward but extremely tedious to derive an analogous 27" 
degree polynomial from « = N)(x), the third iterate of N. After eliminat- 
ing the three points of period 1 there remain 24 points of period 3, some of 
which might not be real. Similarly, for any n one can derive a polynomial 
from 2 = N(x), a polynomial whose roots have period dividing n. Once 
the points that have a period that is a proper divisor of n are eliminated, 
all remaining points oscillate with period n under the Newton iteration. 

This shows that one should exercise care in picking a good starting point, 
because even something as seemingly simple as Newton’s method applied to 
a nonnegative polynomial can have complicated behavior. In practice, these 
complications are unlikely to arise, because round-off error will usually be 
sufficient to keep the iteration from settling down into periodic behavior. 
A more practical difficulty is starting at or near a zero of the derivative. 
This can throw the iteration into some unexpected region and may lead 
to numerical underflow or overflow. For nonnegative polynomials, starting 
at x > Ag ensures that the derivatives have non-zero values. In summary, 
Newton’s method is well-behaved when used to find the positive root of a 
nonnegative polynomial, but you must remember to use a starting value 
that is larger than the root. 

Newton’s method can also be used to approximate the other roots of a 
nonnegative polynomial, but these roots have some extra complications. 
For instance, we do not know a good initial estimate for the second root. 
Also, the second root may be nonreal. While there is a variant of Newton’s 
Method that can be used for complex numbers, other methods are more 
effective [1]. 


5.4.1 The rate of convergence in Newton’s method 


To obtain additional insight into the behavior of Newton’s method for 
nonnegative polynomials p(a) we look at the sequence of errors. For this 
it’s helpful to write E(x) = «— Ao and E = E(N(x)). Suppressing all 
occurrences of the variable x, the error equation (5.15) becomes 


E= E’a, 
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where a = (Ep! — p)/p' E is a function of x. Since E = x— Xo is a factor of 
p(x), there exists a polynomial g such that p = Eq, and using the product 
rule to calculate the derivative p! = Eq’ + q, we obtain 


Ep'-p_¢ 


pb? pl . 


(5.16) a= 


This allows us to prove the following result. 


Theorem 5.4.2. For a nonnegative polynomial p(x) with positive root ro, 
the error in Newton’s method started at any x > Ao obeys 


2 // x ) 
peg o. 
2 p'(Ao) 


Proof. Since E(Ao) = 0 gives p”(Ao) = 2¢q/(Ao), the result follows from 
(5.16) if we can show that a(x) is decreasing for 7 > Ao. For this we prove 
that a’(x) is nonpositive for all 2 > Ao, which is equivalent to 


(5.17) pq’ _ pd — Tq" = pT. < 0, 


where . Z 

T, -y— Wo TH gf u 

=P ee 

Recalling that q(a) = ar H,x*-!~*, where Hp = 1, Hi,..., Hy—1 are the 
positive numbers given by the Horner sequence applied to the evaluation of 
p(Ao), we see that q(x) is positive for x > Ag. Since p'(x) is a nonnegative 
polynomial whose dominant root must be less than Ao, its derivative must 
also be positive for all x > Xo. We complete the proof by showing that 
Ti(z) < 0 and Th(x%) > 0 for > Xo. To do this, we note that each 
of these polynomials has the form f’(z) — 74, f’() for some polynomial 


f(x) = a ajx?-J, and 


x 


f@= a 

d= , #3 

= Do d- jaja? — sD (d- id= 1 = jaja? 
j=0 j=0 
= ii— 

= D(a Hage ~J (1- 2) 
a 

J 


— (d—j)ajz7-*-4 
j= 


This identity says that T,(x~) < 0 and T2(x) > 0 for all positive x, and 
(5.17) is satisfied for all x > Xo. O 


5.4 Calculation of the Roots 123 


This result is helpful because it says that the sequence of errors satisfies 
E; < a(\o)~?' +») after i steps if initially Ey < 1/a(Xo)?. The difficulty is 
that we don’t know how long it takes for Newton’s Method to reduce the 
error to less than 1/a(Ao)?. 

Until now we’ve considered only absolute error, that is, the difference (or 
deviation) of our estimate from the actual value. Often, comparing the size 
of the actual error with the size of Xo is more appropriate, and that’s what 
is meant by the relative error, 5(%) = E(x)/Ao. Rephrasing the above 
result in terms of relative error, we have 


« E(x) _ (6do)?a(w) 99 Aop"(Ao) 
O(x < — <o Ue) ; 


The homogeneity of the convergence factor associated with the relative 
error criterion yields the following theorem. 


Theorem 5.4.3. The relative error in Newton’s method for approximating 
the dominant root of ak degree nonnegative polynomial obeys 


b2(h= 1d". 


In fact, the convergence factor Aop'(Ao)/2p" (Ao) is guaranteed to lie in the 
interval [(k —1)/2, k — 1). 


Proof. Following the proof of Theorem 5.4.2, we note that for any polyno- 
mial f(x) = ya ajz*—1—* of degree k — 1 we have 


(5.18) (k—1)f(«) —af'(a =D agia®-1-* , 


Using this identity for f(x) = p'(xz) = ka*-1 — e(k — 1)a*-? —- +» — cea, 
we see that no coefficient in the polynomial (k—1)p'(a) —ap"(2) is positive, 
and x = Xo gives the lower bound of (k — 1)/2 for the convergence factor. 

From p’(Ao) = q(Ao) and p” (Ao) = 2q’(Ao) we get another representation 
of the convergence factor that is not computationally helpful unless the 
quotient g(a) can be determined. However, this representation can be used 
to successfully obtain the upper bound we want to prove here, namely that 


Nod’ (Ao) < (K = 1a(Ao) - 
Recalling that 
q(Ao) = AQ + MAG 7 ++ + Aer, 


where each Horner element H; is positive, (5.18) with f(x) = q(x) gives 
the required upper bound. O 
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This result gives doubling convergence for quadratic polynomials. It can 
be extended to polynomials of any degree. 


Corollary 5.4.4. Let do be the initial relative error when Newton’s method 
is used to approximate the dominant root of a k* degree nonnegative poly- 
nomial. If 59 satisfies d9 < 1/k, then the number of correct digits doubles 
at each iteration when the computation is done in base b= k/(k — 1). 


Proof. From the theorem, 6:41 < (k — 1)d; and 


(k—1)8 


dy < (h—-1)/F'*, ae ae he 


It remains to locate an initial value for which the relative error is suffi- 
ciently small. For this we consider the notion of “scaling”. In this context, 
scaling means replacing x by a new variable y of the form x = Sy. The 
standard example is approximating a square root. (Refer to Section 9.4.4 
for an analysis of the run time when Newton’s method is used to calculate 
a square root.) The square root of A > 0 can be computed using Newton’s 
method, since it is the dominant root \o of p(x) = 2? — A. The Newton 
iteration formula is 


2_A 
N(a) =a - ss : 
22 
or the computationally better 
xc A 
N(a)==-4+—. 
NS 5 oe 


We scale the expression «?— A by replacing x by « = 2'y, where A lies inside 
the interval 4!~-! < A < 4! (such | might be negative). Then x? — A = 0 
iff y? — 4 = 0. Once we scale, the root jy = V'A/2! of h(y) = y? — A/4! 
satisfies 1/2 < foo < 1, and when y = 1 is chosen as the initial value, we 
have Ey < 1/2. In this example, the convergence factor for the relative 
error is 

oh" (Ho) 

h'(u0) 


from which we obtain doubling convergence. Hence, the number of correct 
bits when Newton’s Method is used to find jug at least doubles at each 
iteration, and E < 1/2" implies E < 1/27". After the root jo has been 
determined with desired accuracy, the root Ag is easily obtained by multi- 
plying by 2!, a binary shift. Scaling helped to identify a good initial value for 
the Newton iteration, and information about the scaled polynomial easily 
translates back to information about the original polynomial. 
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In general, if p(x) = x* — + c,x*—* is a nonnegative polynomial with 
dominant root Ao, scaling it by @ yields the new polynomial 


k 
h(y) = B-* ((By)* - S ci(By)*- Ne = yk - er ey", 
w=1 
with dominant root fp = Ao/Z. It can be checked that the relative conver- 
gence factor remains unchanged, that is, 


Nop" (Ao)/p' (Ao) = Hoh" (Ho) /h'(t0) - 


How is this used? First, if it happens that }>c; > 1, we repeatedly scale 
by a factor of 6 = k/(k — 1) until the coefficient sum either equals 1 (in 
which case the dominant root equals 1) or is less than 1. The last scaling 
places the scaled root po in the interval ((& — 1)/k, 1). If it happens that 
jo lies in the subinterval ((/ — 1)/k,k/(k +1)) of length 1/k(k +1), then 
do < 1/(k?—-1) < 1/k holds for the initial value Up = k/(k+1). Otherwise, 
when po € (k/(k + 1),1), the usual Up = 1 gives do < 1/(K +1). In 
either case, we’ve identified an initial value for which the relative error 
satisfies 69 < 1/k and doubling convergence is ensured. Once the correct 
degree of accuracy is attained for io, we scale back to Ao by a shift in base 
G3 = k/(k —1). Since our scaled values are less than 1, the relative error 
is an upper bound on the absolute error, and the number of correct digits 
really does double at each iteration. 


5.5 Asymptotic Size of Solutions 


5.5.1 Homogeneous nonnegative recurrences 


In this section we consider the asymptotic size of solutions to homogeneous 
nonnegative recurrences that can be written in the form 


(HNN) Sp, = C1$8n—-1+C28n—-2+°+'+CKSn—z Where all c; > 0 and cy #0. 


Since the characteristic polynomial is a nonnegative polynomial, the results 
of Section 5.1 can be applied, and we let Ao be the dominant eigenvalue 
of the recurrence. Keep in mind that we’re also assuming that the initial 
conditions are nonnegative. 


Theorem 5.5.1. Let Xo be the dominant eigenvalue of the homogeneous 
nonnegative recurrence (HNN). If the initial conditions are nonnegative, 
then 8, = O(AG). If in addition there are k consecutive positive elements 
in (Sn) , then 8, = O(XG). 


Proof. For any natural number N we define the finite set of real numbers 


Sn ={sif\,: N<i<N+Kk}, 
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and set ay = min(Sy), Gv = max(Sy). Then 

(5.19) any, < 8; < Bn for all N<i<N+k 
and we prove that 

(5.20) Sn < PoAG for all n > 0 


by induction on n. By construction, (5.20) holds for all 0 < n < k. If (5.20) 
holds for allO <n < K, then 


i k 

K-i K 

8K = s CiSK—i < Bo S CiAg = Borg 
i =| 


since oy ea = di. Therefore, sx < Go\K, and we've proved (5.20), 
and so 8, = O(Aj). We can obtain s, > aoX§ by a similar argument, 
but ao might be zero. When there are k consecutive positive values, say 
SN,---,;SN+k-1, then ay is positive, and beginning the above argument at 
n = N instead of n = 0 gives 


k-1 k-1 
N+i N+k 
SNik = S- CGiSN+i > ON >, Gay Saag. 
i=0 i=0 
By induction, s, > anAg for all n > N, and so sp = O(AG). O 


For example, the Fibonacci recurrence is a homogeneous nonnegative 
recurrence in which all terms after the first are positive, and this result 
says that fn = O(AZ), where Ao = Lvs is the unique positive eigenvalue. 
Another simple example is 


Sn = Sn-1 + 25n-2 with 50 = 1, 8, = 2. 


Since both initial conditions are positive and ch(z) = 2?-—a2-2 = 


(a + 1)(a — 2), the solution is 0(2”). In Exercise 5.3 you consider this 
recurrence under various choices of initial conditions, including when some 
initial conditions are negative. 


Corollary 5.5.2. Let (s,) be the solution to a homogeneous nonnegative 
recurrence with nonnegative initial conditions, not all of which are zero. If 
the characteristic polynomial is primitive, then (s,) is eventually positive 
and 8, = O(AG). 


Proof. It suffices to prove that the sequence is eventually positive. Let s, = 
C18n—-1++ + -+c¢KSn—~. We may assume that so > 0. Setting G = {7 : c; > 0}, 
then so > 0 implies s; > 0 for all 7 € G, and so also s; > 0 for all 7 that 
are nonnegative sums of elements in G. By primitivity, ged(G) = 1, and 
this in turn means that every sufficiently large integer can be expressed as 
a nonnegative sum of elements of G. (Refer to Exercise 5.5.) oO 


5.5 Asymptotic Size of Solutions 127 


What about periodic recurrences? An simple example is 
Sn = Asn_-2 , 


which has g = 2, periodic characteristic polynomial ch(x) = x? — 4, and 
dominant root Aj = 2. With initial conditions so = 1 = s,, the solution is 


én = 4ur/2) 


and 8, = 9(2") = O(Ag). However, for the initial conditions s9 = 1, s1 = 0, 
the solution is 


4/2 when n is even , 
Sn = : 
0 when n is odd , 


and 8, = O(2”). But s, # O(AG), since every odd position in the sequence 
is zero. 

In general, a periodic recurrence whose index of imprimitivity is g 4 1 
can be expressed as a system of g primitive equations because 


Sn = CgS8n—g + Cog8n—2g Feo + CrgSn—rg 
g g g g 


can be written as a primitive system; namely, 


i a Cr a Tr Gays = Gighh 
) = Gin 4 Tr cagt?» or ee 


t9-) _ et” re Cogt 


n—- 


ae 


paca dnGagh ee 


where the initial conditions for each ( G )y are the set of original initial 
conditions whose subscripts are congruent to 7 (mod gq). If all initial con- 
ditions for a particular 7 are zero, the sequence ( G dy is the zero sequence. 
Otherwise, at least one initial condition for ( G )) is positive, and by the 
corollary, (3) — Q(AG"). Translating this back to the original sequence (s,,) 
, we see that for a fixed 7 the subsequence whose subscripts satisfy the 
arithmetic progression n = j (mod g) either is the zero sequence or grows 
like \j. In some sense we can therefore regard (s,) as a periodic sequence 
with period g, or under some special circumstances the period of (s,,) may 


be a divisor of g. 


5.5.2. Nonhomogeneous nonnegative equations 


Here we look at certain nonhomogeneous nonnegative equations with one 
of three types of non-zero input functions. 
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Theorem 5.5.3. Consider a nonnegative difference equation 
(5.21) Sp = C18n_1 + C28n_a+ +++ + CeBn—e + O(n), 


where g(n) is nonnegative for each natural number n. Let Xo be the dom- 
inant eigenvalue of the recurrence. If s, > 0 for all sufficiently large n, 
then 


© Sy, = O(AG) tf g(n) = OLY) for some positive 1 < Ao; 
© sp = O(g(n)) if g(r) = O(AY) for some Az > Ao; 
© Sp = O(n4*1)®) if g(n) = f(n)AP for a polynomial f with deg(f) = d. 


These three types of nonnegative forcing functions correspond to the 
situations in which g(n) is much less than Aj, much greater than Aj, and 
equal to Ag times a polynomial in n. For the last two types of forcing 
functions the hypothesis requiring s,, to be eventually positive is of course 
superfluous. In the first type, the required eventual positivity follows if 
there are k consecutive positive terms, which can result from positive initial 
conditions or positive g(n) or some combination of these forms of positivity. 

Notice that the solutions asymptotically behave about as one expects, 
except that the behavior in the third case may be a little unexpected. In 
that case, the response to forcing by a polynomial times \§ results in a 
polynomial of one higher degree times Aj. The most common occurrence 
of this is with g(n) = Aj, and we emphasize this special case because it 
arises so often in practice. 


Corollary 5.5.4. For any positive constant b, any nonnegative difference 
equation of the form 


8n = C18n—1 + C28n—2 + +++ + ChSn—K + DAG 


has solution 8, = O(nrQ), where Xo is the dominant eigenvalue. 


The remainder of this section is occupied with proving the theorem. The 
proof is a good example of the technical arguments that are often involved 
in asymptotic analysis. 


Proof of the Theorem. Let N be such that s,, 4 0 for all n > N. Similarly 
to what was done for the homogeneous case, for any positive \ we define 


S, ={s,/\: N<i<N+k}, 


and set a, = min(S)), 3, = max($). (Notice that a, and (3) are functions 
of 4.) In particular, 


(5.22) aA <s;<G)r° forallN<i<N+k. 
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Case A. g(n) = O(A}) for some positive 1 < Ao. Setting ap = a), and 
Bo = Bro» 
aoAG < Sn < BoAG for all N<n<N+k. 


As in the proof of the homogeneous case, if s, > agAg for all N<n< kK, 


then 
k 


k 
skK= s GSK-itg(n) > ao ‘> 6y Sagas , 
i=1 i=1 
since g(K) > 0 and ch(\o) = 0. Therefore, by induction we have that 
Sn > arg for alln > N. 

Since g(n) = O(A7), there exist positive C, Ny such that 0 < g(n) < CA} 
for alln > Nj, and we can increase C if necessary (when N < Nj) to obtain 


0 < g(n) < CA} for alln > N. 


From the fact that Ay < Ao, then —CA¥ /ch(A1) is a positive constant, which 
we call 7. (Recall Exercise 5.2.) Setting 6; = 89 + 71, we’ll prove that 


8n < BAG — NAP for alln > N 
by induction on n and thereby obtain s,, = O(Aj). For this, we note that 
BiArAG — WAT = BorAG + 1(A6 — AT) > BoAG = Sn for all N<n<N+k. 
Suppose we’ve shown that s, < 8,Ag — y1A7 for all N <n < Kk. Then 


k 
8K = GSK—-itg(K) 
i=l 


k 
< So ci(Bag* — Watt?) + Cat 
i=1 


=f, 3 oAg—* — yy AK-* 3 cat * + OAK 
i=1 i=1 
= Big + WAY" (cha) — At) + CAT" 
= Biro — WAL + AY "(1 eh) + CM) 
= BrK - yk, 
by construction of y;. Then s, = O(AQ) does hold. 
Case B. g(n) = O(AS) for Az > Ao. Let a, 3 > 0 be such that 
ars <g(n) < BAY, for alln > Nj. 


For n > Ni, 


k 
sn = S- CiSn—i t+ g(n) = g(n) = ads, 
i=1 
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giving a lower bound on s,. For an upper bound on s,, from (5.22) with 
A = 2 we obtain 32 such that 


(5.23) Sn < G2A5 forall N<n<N+k. 


Increase 2 if necessary to obtain 82 > A¥B/ch(Az2), which is positive be- 
cause Ap > Ag. Recalling (5.23), we assume s, < G2A5 for all N<n< M 
and prove that this also holds for n = M, since 


k 
8M = S- cism—i+9(M) 


i=1 


k 
< SU ciBorgi + Bd" 


i=1 


k 
= Bort * NS" eiAS~* + BAN 


i=1 
= Bdz"~* (AS — ch(Az)) + BAS" 
= Bod3" + AZ" "(BAS — Be ch(A2)) < Bord, 


from the choice of (2. Therefore, for all n > N we have 
ANZ < Sn < BAY, 
and we’ve proved that 5s, = O(A%). 


Case C. g(n) = AO f(n). For this case we divide the recurrence in (5.21) 
by Ag to get 
Sn ‘ Ci Sn-i 


\n Yi \n—i f(n) . 
AO j=1 Ab Xo 


Defining ty, = s,/Av and b; = c;/4, this becomes the new recurrence 


k 
tn = o> bitn—4 + f(n) 
i=l 


where we note that 


k k 
Saas = by aan 
Xo 


w=1 


since AK = cidg Because 8, = tpi, it suffices to prove tn, = O(n7+?). 
Since f(n) = Q(n®), there exist constants 71,72 and positive fea N 

such that yin? < f(n) < yen? for all n > N. Define S$ = {t,/n?*! : N< 

n < N+k} and let a > 0 be the minimal element of SU{y1/(k-+1)4+4} and 
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G the maximal element of SU {72}. We'll prove that an?+! < t, < Bn?! 
by induction on n, which by the construction of @ and @ holds for all 
N<n<N+k. Assuming an?! < t, < Bn¢! for all n < K, we prove 


that this inequality also holds for n = kK. Then 
k 
S\bia(K —1)%1 4 1K4< te = ms bitk—i + f(K) 


< Yona ith + yK4, 


where K —k< K—i< K—1and *_,b; =1 give 
a(K — k)?t1 44 K4 < tx < B(K —1)1 4 2K?. 
Therefore, the choice of yz < @ implies 
tk <BKY(K-14+1)=6K™', 
and a < 71/(k+1)4*! gives 
te > a((K —k)1 + (k4+1)1K%) > KU 


by Exercise 5.18. 


O 


There are many variations of the results in Theorem 5.5.3. Because they 
can be proved in essentially the same way as the results given in the theo- 
rem, we simply state them here and leave their proofs as exercises. 


e If g(n) = O(AZ) for Ai < Ao, then sp = O(AG). 


(a) If the difference equation is primitive and g(n) = O(A}}) for 


Ai < Xo; then s, = O(AG). 


(b) If s, > 0 for & consecutive values of n and g(n) = O(A7) for 


Ar < Xo; then s, = O(AG). 


e Let G(n) = max{g(n), A2g(n—1),...,A9~*g(1)}. If g(n) = 
Az > Xo, then: 


(a) sn = ae 


) n)). 
(c) «= OG (n ))- 
© O(g(n)), then sn = O(g(n)). 


= (Gin), then 5, OGG). 


Q(A2) for 


) 
g(n) = h(n)AG for some nonnegative h(n), then s, = O(ng(n)) and 


e If g(n) = h(n)AG for some nonnegative h(n) that is nondecreasing, 


then s, = 1 Dae Ap “g(7))- 
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5.6 Exercises 


Ex 5.1. Let p(x) be a polynomial, and let \ be a root of p(a) whose 
multiplicity is m > 2. Show that » is a root of p’(a) with multiplicity 
m—1. 

Ex 5.2. Show that if p(a) is a nonnegative polynomial with positive root 


Ao, then the values of p(#) change from negative to positive at x = Xo. 
From this conclude that for x > 0, 


ch(z) >0 == «> 
by showing that p’(x) is positive for all x > Xo. 
Ex 5.3. Consider the recurrence s,, = 8,1 + 28n_2. 
(a) Show that s, = 0(2”) for so = 0, s1 = 15. 
(b) When so = —4 and s; = 7, show that s, > 0 for all n > 3, and 


Sn, = O(2") still holds. 
(c) When so = —1, 51 = 1, show that s,, 4 0(2”). 


Ex 5.4. In this problem we consider the k*® order Fibonacci recurrence 
ta = fra + fr—2 ee tt fash 


with primitive characteristic polynomial and dominant root Ao. 
(a) Use Newton’s Method initialized at x) = 2 to show that Ay < 2 — 
1/(2* — 1). 
(b) Show that Ao > 2 — 2/(k +1) 
(c) (HARDER) Show that Ay > 2—2/2*. 


Ex 5.5. Let a and 6b be fixed positive integers. 
(a) Show that an integer n is expressible as a nonnegative combination 
of a and b (that is, there exist nonnegative integers i,j such that 
n =ia+ jb) iff n is in one of the following sequences: 


0,6, 2b,..., 
a,b+a,2b+a,..., 
2a,b+ 2a,2b+ 2a,..., 


p= Hibs bib tia, d. ‘ 


— 
= 


When gcd(a,b) = 1, show that each of the above sequences is in 
a different congruence class modulo b. Use this to show that every 
integer n greater than (a — 1)(b— 1) is in one of these sequences and 
so can be written as a nonnegative combination of a and b. 

If G is any finite set of positive integers with gcd(G) = 1, show that 
every sufficiently large integer can be written as a nonnegative sum 
of elements in G. 


— 
fe) 
nee) 
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Ex 5.6. Let G be a finite set of positive integers and let g = gcd(G) be 
the greatest common divisor (gcd) of the elements of G. Then ¢ is an 
i** root of unity for every i € G iff ¢ is a g*” root of unity. 


Ex 5.7. Show that 
(a) p(x) = (a — 3)(x + 1)3 is a primitive nonnegative polynomial; 
(b) p(x) = at — «3 — 5x? — x —6 is a primitive nonnegative polynomial 
with two roots on the unit circle. 


Ex 5.8. Consider 


Show that Ao = 2 is the unique positive root of p(w) and that there exists 
a quadratic polynomial g(x) such that p(w) = (a — 2)(q(x))?, where every 
root » of q(x) satisfies |\| = 5/2. With this you see that even a primitive 
nonnegative polynomial can have subdominant eigenvalues that are not 
simple. 


Ex 5.9. If pee |b;| < 1, show that every solution to 
Yn = biYn—-1 aes beYn—k 


satisfies |y,| << max{|yo|,|y1|,---,|yx—1|}- 


Ex 5.10. In this problem we consider 


Sn = 28n ir 28n a 38n 3: 


Use the method of Exercise 4.22 to show that the coefficient of the dominant 
eigenvalue in the closed form of the solution is a = (89 + 81 + s2)/13. 


Ex 5.11. Show that the sequence of deviations is periodic for the solution 
tO Sp = 28n_-1 + 25n_2 + 35n_3 with sp = 0,5; = 3,52 = 9. 


Ex 5.12. Show that the nonnegative solutions to 
In = 5Ln—1 F Atn—2 


converge to aAg in the absolute value sense. If the initial conditions are 
nonnegative integers, will z, = Round(a\j)? 


Ex 5.13. Let p(x) = 2* + cyx*-1 +--+ cp_12 +c be a polynomial with 
integer coefficients. 
(a) Show that any real root of p(x) is either an integer or is an irrational 
number. 
(b) Give an efficient algorithm to determine whether or not the domi- 
nant root of a nonnegative polynomial with integer coefficients is an 
integer. 
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Ex 5.14. Let p(x) be a monic nonnegative polynomial with dominant root 
Ao. Show that each H; generated by Horner’s Method in the evaluation of 
p(x) at Ao is positive. 


Ex 5.15. Show that the polynomial f(x) = x? + 3x? — 13a + 17 has one 
real root and it is negative. Try various positive initial conditions to see 
if Newton’s method converges to the root. Conclude from this that the 
assumption of nonnegativity is needed to ensure that any initialization of 
Newton’s method above the root will converge to the root. 


Ex 5.16. Show that there exists an initial value such that Newton’s Method 
applied to f(x) = x3 + 32? — 1324+ 17 oscillates with period 3. 


Ex 5.17. Show that applying Newton’s Method to the Fibonacci polyno- 
mial x? — x — 1 with initial condition 2 produces the sequence (fo; +1/f2;) 
converging to the dominant root (1 + V5)/2. 


Ex 5.18. Write n+? as [(n — k) + k]“*1 and use the binomial expansions 
of this and of (k + 1)4+! to show that 


(n = ie > netl _ (k+ ij" 24 . 


Ex 5.19. Find the asymptotic size of solutions to 


Sn = 28n—-1 + 28n_2 + 35n_3 + g(n) 
with nonnegative initial conditions for each of the following choices of g(n): 
g(n) = 2"; g(n)= 4"; g(n) = 73". 
Ex 5.20. Find the asymptotic size of the solution to 
fn = fn-1 + fr-2t9(n) fo=0,f =1, 
where g(n) satisfies the recurrence 
g(n) = g(n—1) + g(n— 2) g(0) = 0,90) = 1. 
Ex 5.21. Let (s,,) be the solution to a nonnegative recurrence 
By = C1Bpo1 + Cosy9 + 2+ + CeSnaE + O(N) - 

Show that: 

(a) If g(n) = O(Z) for Ar < Ao, then sp, = O(AG). 

(b) If the recurrence is primitive and g(n) = O(A}}) for Ar < Ao, then 

Sy = O(AG). 


(c) If s, > 0 for k consecutive values of n and g(n) = O(A}}) for Ar < Xo, 
then s, = O(AG). 
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Ex 5.22. Let (s,,) be a solution of the nonnegative recurrence 
Sn = C18n—1 + C28n—2 + +++ + CeSn—k + g(r). 


Let G(n) = max{g(n), A2g(n—1),..., A971 g(1)}. If g(m) = Q(AR) for A2 > 
Xo, show that: 

(a) 8m = Q(A). 

(b) $n = Q(g(n)). 

(c) 8n = O(G(n)) . 

(d) If G(n) = O(g(n)), then s, = O(g(n)). 

(e) If g(n) = Q(G(n)), then s, = O(G(n)). 


Ex 5.23. Let (s,) be a solution of the nonnegative recurrence 


Sn = C18n—1 + C28n—2 +++ + ChSn—k + 9(N). 


Show that: 
(a) If g(n) = h(n)AG for some nonnegative h(n), then s, = O(ng(n)) and 
Sn = OQ(g(n)). 


(b) If g(n) = h(n)AG for some nonnegative h(n) that is nondecreasing, 
then sy = ("4 M29(j)). 


6 
Leslie’s Population Matrix Model 


6.1 Leslie’s Model 


In the Fibonacci story (see Chapter 1) the rabbits are immortal, but with 
a few exceptions, like the Energizer Bunny, we know that real rabbits have 
small finite lifetimes. We can make this model more reasonable by reinter- 
preting the meaning of the age classes while still maintaining the mathe- 
matical form of the model. Recalling the original model, if A; is the number 
of adult pairs at time t, and Y; is the number of young (not yet breeding) 


pairs at time ¢, then 
Att _ 1 1 At 
Yui) |v 0] yy 


The assumed immortality of the adults is represented by the 1 in the upper 
left entry of the matrix. The young disappear by becoming adults, but they 
are replaced by each adult pair producing a new young pair. What happens 
if we simply interchange adults and young, leaving the matrix alone? We 


get 
Yin.) _ fl 1] (% 
Ari) {1 0] \A/’ 


but our interpretation is now quite different. New young are produced by 
both young and adults. The adults disappear, but they are replaced by 
youths who grow into adulthood. So by reinterpreting this model we can 
replace the unrealistic assumption of immortality with the biologically re- 
alistic assumptions of aging and death. As a leading biologist said at a 
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recent meeting, “The facts are always changing, but a good model goes on 
forever.” 

The reinterpreted model still has some unrealistic features. Should every 
youth survive to adulthood? Should the number of offspring produced by an 
adult pair and a young pair be the same? The model can be generalized to 
avoid these unrealistic features by replacing the 1’s in the matrix with other 
constants. Since the lower left 1 represents a youth surviving to become an 
adult, we could replace this 1 by s, which is called the survival rate, and 
reasonably assume that 1 > s > 0. The 1’s in the top row represent the 
number of offspring produced by a youth and an adult respectively. We 
can replace these 1’s with the fertility rates f; and fz and simply assume 
that f; > 0 and fz > 0. The generalized model is 


Yiu) |fi fel (% 
Att ~ Ss 0 A; : 
Several special cases suggest themselves. What happens if f; = fo = 0? 


Then, if we start with the population a we next get ( y ) and then 


Ao sy; 
0): So as expected, if the fertility rates are 0, the population dies out, 
and in this model the population dies out in two time steps. What happens 


if fo = 0, but fi # 0? Then starting with Ye , we get Ave and 
At sy; 


2 
then Yfi . These last two vectors can be rewritten as Y; fi and 
Yis fi S 


Yi fi 2). Notice that these are both multiples of the vector Ge and 


that neither depends on A;. These observations agree with the expectations 
that the individuals past reproductive age eventually have no effect on the 
composition of the population, and the eventual shape of the population 
depends on the structural parameters f;, fo, and s, and not on the initial 


population. What happens if f; = 0 and fo 4 0? Then, starting with & ‘ 


At 
A Y, 
we get & ‘ and then & A) . In contrast to the other cases, the shape 
of the population oscillates between multiples of (Zi) and ee | In the 
t t 


second and third cases the population may increase or decrease depending 
on whether the multiplier, f; or sf, is or is not larger than 1. These 
cases show that this simple model does have some features that we want 
in a population model. The population size may increase or decrease. The 
population may die out. The shape of the population may approach a stable 
form, or the shape of the population may oscillate. By doing some simple 
calculations on the parameters, we would like to be able to determine which 
of these various situations occurs. 
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We can generalize the Fibonacci model with two age classes to a model 
with k age classes. In population biology the model with k age classes 
is usually called Leslie’s model. In 1945, Leslie [95] published one of 
the most influential papers in population biology. In it he introduced a 
generation of biologists to vectors and matrices. The model Leslie described 
is quite similar to the renewal model [62] and the method of life tables, 
which were devised in the 1700’s by such mathematicians as the Bernoullis, 
Euler, and Halley. (For more on the history, consult Boyer’s History of 
Mathematics [12].) These methods form the basis of an entire industry— 
insurance. Further, these methods were not unknown to earlier biologists. 
For example, Lotka’s 1925 book, Elements of Mathematical Biology [101], 
devotes an entire chapter to them. On the other hand, Leslie presented a 
concise formulation of the model, and the mid-twentieth century generation 
of biologists was a receptive audience. 

The model can be concisely stated as 


Xp41 = LX, 


where X; and X;41 are population vectors and L is a Leslie matrix (which 
is explicitly defined below.) The model assumes discrete time, and the time 
unit must be chosen appropriately for the organism being modeled. For 
bacteria the time unit might be 20 minutes. For many insects an appropri- 
ate time unit might be one week. For many vertebrates a one-year time unit 
might be used. For human populations a five-year time unit is often used. 
The population vectors have a number of components. Each component is 
the number of individuals of a particular age. For example, if 


Xt = (21, £2,03,4)" , 


then at time ¢, x; is the number of individuals in the first age class, x2 
is the number of individuals in the second age class, and so forth. Said 
another way, x; is the number of individuals of age i— 1, because newborns 
are usually assigned age 0 and not age 1. If one is not really happy with 
the discrete-time assumption and assumes that discrete time is only an 
approximation to an underlying continuous time, then one could say that 
xz; represents the individuals with ages a for which i—1 < a < 7. One 
could instead include in x; the ages for which i—1 < a < 7, with newborns 
included in x1. Or one could say that ages for which i-—1 <a <i are 
represented by x; and claim that there is no ambiguity because there are 
no individuals of exactly age i for any 7. The main point is that the width 
of an age class is one time unit. If the data are arranged for five-year age 
classes, this model will not calculate the population even one year in the 
future. 
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Generalizing from the Fibonacci example, a Leslie matrix contains both 
survival rates and fertility rates; specifically, 


i fe ase ce 
S1 0 tae soe 0 

L= 52 
Sk-1 0 


The first row of L consists of fertility rates, where f; is the number of 
offspring (newborn) produced by an individual of age class 7 in one time 
unit, and the subdiagonal of LZ contains the survival rates, where s; is the 
probability that an individual in age class 7 will survive to age class 7 + 1. 
All other entries in LZ are zero. 

The usual assumptions on these parameters are that for each 7,0 < s; <1 
and f; > 0. The first assumption makes sense if one interprets s; as a 
probability and assumes that there is some possibility for an individual to 
survive a particular age class into the next age class. Further, if any s; 
were zero, then in k — i steps the population would become a population 
in which the last k — iz age classes are empty, and all future developments 
occur within and depend only on the first 7 age classes. For similar reasons, 
one usually assumes that f;, > 0. That is, if one or several of the oldest age 
classes have zero fertility, then the composition of these older age classes has 
no effect on the rest of the population, and in a small number of steps the 
composition of these age classes is determined by the younger age classes 
with no effect from the original composition of these oldest age classes. 

An extra assumption made in Leslie’s original model and often used in 
demographic applications is that at least two adjacent fertility rates are 
positive. This assumption is often enforced by averaging fertilities. That 
is, the number of offspring from females in each age class is measured, but 
a fraction of these are attributed to females in the next age class because 
the females are assumed to be aging as the measurements are taken. A 
mathematically more appropriate assumption, which includes the Leslie 
assumption as a special case, is that there is a power of the Leslie matrix 
that is strictly positive. Luckily, this can be checked easily using the greatest 
common divisor of the indices of positive fertility rates. Later, we will prove 
the following result. 


Theorem 6.1.1. Let L be a Leslie matrix. Then there exists m > 0 with 
L™ > 0 iff gcd{i| f; > 0} = 1, where A >> 0 means that every entry in 
the matrix A is strictly positive. 


This theorem says that there is a power of L that is a strictly positive 
matrix. The following theorem gives bounds on the least exponent m for 
which the power L™ is strictly positive. 
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Theorem 6.1.2. Let L be ann xn Leslie matrix with gcd{i| f; > 0} = 1. 
Let mo be the least nonnegative integer such that L™ >> 0, and let | be the 
least positive integer with f; > 0. Then 


mo <i(n—2)4+n<(n—-1)? 41. 


We will prove both of these theorems in Chapter 7 in the more general 
context of nonnegative matrices (which might not be in Leslie form.) The 
bounds given in Theorem 6.1.2 are tight. For instance, for any n > 2 
consider a Leslie matrix in which f; fo --+ = fn-g = 0, fn—-1 > 0, 
and f, > 0. Here, gcd(n,n — 1) = 1, and it is relatively easy to show that 
mo = (n— 1)? +1. (See Exercise 6.8.) 

These two theorems can be generalized to apply to all nonnegative matri- 
ces, but the gcd condition must be generalized. For this we use a graphical 
interpretation. A nonnegative matrix A is called primitive if there exists 
positive m with A™ >> 0. A graph G can be associated with A in which 
vertices vu; and v; have an edge vu; < v; iff aj; > 0. (We chose this direc- 
tion, but the other ordering, v; — v;, could be used.) A graph is strongly 
connected if for each pair of vertices v; and v; there is a directed path 
from v; to v; (and so also a directed path from v; to v;.) When the directed 
edges are represented by arrows, a directed path always follows the arrows 
from tail to head. The generalized theorems are as follows: 


Theorem 6.1.3. A nonnegative nx n matriz A is primitive iff the associ- 
ated graph G is strongly connected and the greatest common divisor of the 
cycle lengths in G is 1. 


Theorem 6.1.4. If A is a primitive n x n matrix, then mo, the least 
nonnegative integer such that A™° > 0, obeys 


mo <i(n—2)+n< (n-1) +1, 


where | is the length of the shortest cycle in the associated graph G. 


We defer the proofs of these theorems to Chapter 7, where we will also 
describe the graph associated with a nonnegative matrix in more detail. 


6.1.1 How to tell whether a Leslie matrix is primitive 


How does one determine whether a nonnegative matrix has a power that 
is strictly positive? For a Leslie matrix, the strong connectedness in Theo- 
rem 6.1.3 holds, since all survival rates s; are positive. Also, from the form 
of Leslie matrices we see that every cycle in the graph contains the vertex 
1, which means that the necessary and sufficient condition for primitivity 
becomes ged{i|f; > 0} = 1 for Leslie matrices. 

Notice that if LD’ is positive, then L’* is positive for all m, > m. So one 
way to test whether a Leslie matrix is primitive is to raise the matrix to the 
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(n — 1)? + 1 power and then check whether this matrix is strictly positive. 
This technique will require a number of matrix multiplications and hence 
may not be very efficient. A more efficient algorithm is based on the gcd 
condition, where gcd{i| f; > 0} is computed using the Euclidean greatest 
common divisor algorithm. It can be shown (see Exercise 6.2) that the 
Euclidean Algorithm computes the gcd of two numbers that are less than n 
using O(log n) arithmetic operations. Since the Euclidean Algorithm is used 
at most O(n) times, the whole algorithm takes O(nlogn). The algorithm 
for primitivity is given in the next two boxes. 


Recursive Form of the Euclidean Algorithm 
for Greatest Common Divisor 


b ifa=0, 
gcd(bmoda, a) ifaf0. 


gcd(a, b) = 


Algorithm to determine whether a Leslie Matrix is primitive 


Let {is, 12,.-- , ty} = {il fi > 0} 
G:= a1 
FOR J:=2 TO r DO 
G := gcd(G,i7) 
ENDFOR 
IF G=1 THEN Primitive 
ELSE Not Primitive 


6.2 Leslie’s Convergence Theorem 


The content of Leslie’s Convergence Theorem is that a multi-dimensional 
model asymptotically becomes a one-dimensional model. Specifically, the 
theorem states that under reasonable circumstances the population distri- 
bution will converge to a single distribution regardless of the initial distri- 
bution. In general, this convergence is convergence in the sense of relative 
error. With an additional assumption, the convergence can become conver- 
gence in the sense of absolute error. 

The asymptotic growth rate of the population is determined by a single 
number, A9, which depends on the fertility rates and survival rates but is 
independent of the initial population distribution. There is no closed-form 
formula for Ag, but it can be calculated quickly to high numerical accuracy 
using Newton’s method (see Section 5.3). 
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Will the population be asymptotically increasing or decreasing? Ob- 
viously, this depends on Ap, but a highly accurate estimate of Ao is not 
needed to answer this question. All one needs to know is whether \9 > 1 
or Ao < 1, and this can be determined in O(n) arithmetic operations. 

What will the asymptotic distribution, usually called the stable age 
distribution, look like? This distribution can be given as an explicit for- 
mula in Ao, the fertility rates, and the survival rates. Further, assuming 
that Ag > 1 holds, this stable age distribution has an “inverted pyramid” 
form. That is, the largest age class is the newborns, and the size of each 
age class decreases as age increases. Here is the theorem. 


Theorem 6.2.1 (Leslie’s Convergence Theorem). Let L be ann xn 
Leslie matrix in which gcd{i| fi > 0} = 1, and let X be a nonnegative 
vector. Then there is a unique positive eigenvalue Xo of L such that 


Xo 
i L'x a 
ee vb =7 *a ? 
Xe 
where 
1 fi=1, 
Ci = i . 
81 S2°°+ Sj_-]1 ifi=2,...,n, 
and - 
_ yr doo) 2; 
“=i ch oles 
where 
chy (A) = A” — San 
i=1 
and 


n 
gj(A) = x fir. 
i=j 
This form of the theorem is slightly stronger than Leslie’s original the- 
orem, because he assumed that there was some 7 such that both fertility 
rates f; and fj+1 are positive. Clearly his assumption implies the gcd con- 
dition, but there are Leslie matrices that satisfy the ged condition and do 
not have two adjacent positive fertility rates. We will not prove the theorem 
now. Rather, it will follow as a corollary to some more general theorems 
that we will prove later in this chapter. (See Section 6.6.) The above form 
for the limiting vector was chosen to display the inverted pyramid form. 
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Corollary 6.2.2. If 9 > 1 and for each i, either s; < 1, or Ayo = 1 and 
5; <1, then the limiting vector has the inverted pyramid form in which the 
entries of the vector decrease as one goes from top to bottom. 


The other important point is that y > 0 if X > 0. This follows because 
ch',(Ao) > 0, and for each j both c; and g;(Ao) are positive. (For this, refer 
to Exercise 5.2. Also notice that if X = 0, then y = 0 and L‘X is always 
0.) The convergence is in the relative error sense, since the theorems on 
nonnegative polynomials (Section 5.1) guarantee only that Xo is greater 
than the absolute value of the other eigenvalues. For convergence in the 
absolute error sense, one wants Ap > 1 and that all other eigenvalues satisfy 
|A| < 1. For example, the Fibonacci matrix F' satisfies these hypotheses, 
and so not only does 


a: EX iP a 
gin SE = 7 (a3) =9(G 


hold, but also there exists a constant 7 depending only on X such that 
t aye [0 
jim |FLX — 4X5 1 |=0, 


where here the absolute value of a vector means the maximum absolute 
value of its coordinates. 


6.3  Imprimitive Leslie Matrices 


Not all Leslie matrices converge. Some can oscillate. The idea of using oscil- 
lating matrices for populations even predates Leslie’s paper. For example, 
Bernadelli [8] described a hypothetical population of beetles that he mod- 
eled with an oscillating matrix. The periodic cicadas (the famous 17-year 
locusts) can also be described by the oscillating matrices that we will cover 
in this section. 


6.3.1 A simple example 


Consider the two-dimensional Leslie matrix 
0 8 
a , /2 4 
There is only one positive fertility rate, fo, and so g = 2, and we expect 


an oscillation of period 2. The characteristic polynomial is A? — 4, and the 
eigenvalues are +2, which gives 


M2? =4"T and M?™la4a™y, 
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So in some sense the powers of M are periodic. To be more precise, we can 
normalize by dividing M by its positive eigenvalue, Ay = 2, giving 


M_ [0 4 
a \i/4. a\* 


a ll myer oie 
= =] acs SS 
(Fz) =F om (Z) =F. 


and clearly, 


and thus = is periodic with period 2. In general, we say that a matrix A is 
periodic with period p if A?+’ = A’ for all i > 0, and p is the smallest 
positive integer that makes the equation true. While in our example it is 
true that Go = (a¢)" for all 7, we do not say that (+) has period 
512, because the periodicity equations are also satisfied by p = 2. When we 
say that the period p equals 2, we are also implying that A!t’ 4 A’; that 
is, the matrix does not have period 1. 


6.3.2. A special case: Only one positive fertility rate 


Theorem 6.3.1. Let M be an n-dimensional Leslie matrix with fr, its only 
non-zero fertility rate. If Xo is the positive eigenvalue of M, then M/Xo is 
periodic with period n. Also, Ao = (s1 °°: Seat). 


Proof. By direct calculation, the characteristic polynomial of M is the non- 
negative polynomial \” — s1---Sn—1fn- So Ao = (81°::Sn—1fn)'™ is the 
(unique) positive eigenvalue. By the Cayley-Hamilton Theorem (refer to 
Appendix C), M” = AI and n satisfies the periodicity equations. To see 
that n is the smallest such positive integer, consider the orbit of e, un- 
der M: en, Men, M7en,..., MM" en, where €1,€2,...,€n are the standard 
coordinate vectors. For 7 > 0, 


Mie, = 81°°* 8;-1f ne; 5 


implying that the period divides n. Now if M/Xo were periodic with period 
p, then for every vector X, M?X would be a scalar multiple of X. But 
no two e,;’s are scalar multiples of one another. Therefore, for any p < n, 
M?e,, is not a scalar multiple of e,, so M cannot have period less than 
n. (Note that our proof basically shows that for such M the minimal and 
characteristic polynomials are equal.) O 


6.3.3 Asymptotically periodic Leslie matrices 


Fortunately or unfortunately, most matrices—even most Leslie matrices— 
are not truly periodic. Even imprimitive Leslie matrices are usually not 
periodic. (Recall that a matrix is primitive when there is a power of the 
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matrix that has only positive entries.) All powers of an imprimitive Leslie 
matrix contain some zeros, and these zeros move through the powers of the 
matrix in a periodic fashion. The non-zero entries also appear periodically, 
but the values of the non-zero entries are changing, preventing true period- 
icity. If an n x n imprimitive Leslie matrix has at least two positive fertility 
rates, then g is less than n, and the matrix has g eigenvalues of largest 
magnitude and n— g eigenvalues of smaller magnitude. These smaller mag- 
nitude eigenvalues prevent the matrix from being truly periodic, but we 
can expect the contribution from these smaller eigenvalues to disappear 
(at least in a ratio sense) as we take higher powers of the matrix. 
For example, consider the Leslie matrix 


ooo nw 


with characteristic polynomial \4 — 5? — 3, eigenvalues +1, +i/,/2, and 
g = 2. In this example we expect that falein high powers af A leads to a 
matrix that is close to a periodic matrix of period 2, and we can use the 
characteristic polynomial to display this behavior. Since A* = $(A? + J), 


AkK+2 _ Ak = Ae (AP = A) = AK-2(5 4? 4 51 A?) 


s AR 3A? I) = 5(4k At) . 


Iterating this formula gives 


AK ak = (—4)*/? (A? — 1) if K is even, 
(—4)'4/2] A(A? — 1) if K is odd. 
In either case, A? — I or A(A? — I) are fixed matrices independent of K. 
Since (—4)*/? is decreasing exponentially to 0, 


lim (A**? — A*)=0 
K—-oo 
where this convergence is componentwise, which means that each entry in 
the matrix A**?— A* goes to 0. (Other notions of convergence are possible 
for matrices. ) 
Using this example as our paradigm, we say that A is an asymptotically 

periodic matrix with period g if 

lim (A*+9 — A*®) =0 


K-00 


and g is the smallest positive integer that satisfies this equation. 
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To show that our example A has asymptotic period 2, we must show 


that A cannot have a smaller period. If A had asymptotic period 1, then 
for every vector X, we would have 


lim (A**+— A®)X =0. 


K—-oo 


(The 0 on the right side is a vector, not a matrix.) Consider the vector 
X = (4,—2,2,-1)7. 
Calculation shows that AX equals —X. Hence A* X = (—1)* X, and 
(AK+1 _ AK) x = (-1)¥+1x — (-1)¥ X = 2(-1)KH1x, 


which does not go to zero. This means that A does have asymptotic pe- 
riod 2. 


6.4 Companion Matrices 


To set the stage for our proof of Leslie’s Convergence Theorem we take a 
detour through companion matrices. Our intention is to prove convergence 
results about matrices and from these to derive convergence results for 
vectors. We generalize Leslie’s theorem by considering both primitive and 
imprimitive matrices. 

Let A be a matrix in companion form; that is, 


C1 C3 Cn 

1 O 0 
A= ; 

0 1 0 


where c,, # 0. Companion matrices have features in common with Leslie 
matrices, and the relationship between these two types of matrices, will be 
explained a little later. (See Section 6.6.) Companion matrices are called 
companion because they are closely associated with polynomials of the form 
A” — EA"—! — eg\"—-2 — +» — ey. This companionship is not just one-way, 
since the characteristic polynomial of A is, up to sign, the polynomial to 
which A is companion. This last remark can be verified easily by writing 
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down the determinant and expanding it via its last column: 


el = Xr C2 aa ees Cn 
1 —X 
cha(A) = det(A — AI) = 1 
1 —-A 
cy —A C2 see see Cn—-1 1 —-r 
1 —r 1 -A 
=- 1 — (-1)"cn 
—X 
1 —r 1 
_ —\(-1)" 1 — cr"? re €n—1) ( 1)"cn 
= (-1)"[A? _ cA"! oar ee Cn—1X _ Cn] , 
where as usual, the ellipses, ... , indicate an inductive argument. In what 
follows we will blithely ignore the (—1)", and simply say that ch4(A) = 
A” — | An~1 — --» — ep. In fact, we will usually drop the subscript A and 
just say ch(A) = A" — c,A"“1 —- + — ep. 
The Cayley-Hamilton Theorem (see Appendix C) says that every matrix 
satisfies ch(A) = A"—c,A"~-!—c2 A"? —- - -— en I = 0, giving a relationship 


between a polynomial and its companion matrix. We can use this relation 
to compute powers of the matrix A. From the relation 


(6.1) A” = GA) + GA? +--+ el, 


so that 
AW) = A" 4 GA tet eA, 


and using (6.1), we obtain 
AMth = (ef + e2)A"™?* + (ere, + c3)A"? + + 
+ (c1€n-1 + Cn)A+ cient. 


Extending this example, all powers of A can be expressed as linear com- 
binations of the 0” through n — 1%’ power of A. For any k there are coef- 
ficients a1(k),...,Q@n(k) such that 


A® = oy (k)A"—) + g(k) A"? +++) +n (KL. 


This formula says that the problem of computing the powers of a matrix 
can be reduced to computing n scalar functions. Further, the formulas for 
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computing these functions have the following simple form: 


ar(k =r 1) =C\° ar(k) =F Q2(k) 3 
a2(k —- 1) = C2° ay(k) ——- a3(k) 3 


An—1(k +1) = cn_1- a1 (k) + an(k), 
On(k +1) = cn ar(k). 


This is a system of n coupled difference equations, which can be separated 
by converting to an n*” order difference equation, for example, 


ai(k +1) = cai(k) + car (k — 1) +--+ +enai(k-n+1). 


If one knew the solution to this difference equation, then the other scalar 
functions could be computed by 


a2(k +1) = cgai(k) + cgai(k — 1) +--+ + cnpai(k—n+2), 


On(k +1) = cnar(k). 


If one desired, these could also be replaced by n uncoupled difference equa- 
tions 


ai(k +1) = cai(k) + cegai(k — 1) +--+ + cnai(k-—n+1), 
a2(k +1) = crae(k) + cgaa(k — 1) +--+ + cpas(k-—n+1), 


On(k +1) = Gan(k) + caan(k — 1) +--+ + Cnan(k -—n+1). 


Notice that these are really the same difference equation, but each copy of 
the equation has, in general, a different solution because the initial condi- 
tions are different for each copy. 

Although the above formulas give computationally efficient methods to 
compute powers of a matrix, we want a single method that gives a good 
estimate of the asymptotic behavior of the sequence of powers. What does 
A* look like when k is large? In general, the answer is easy. Either all 
the entries in A* are large—that is, grow exponentially in magnitude as a 
function of k—or all entries of A* are small—that is, decrease exponentially 
in magnitude as a function of k. Of course, there are A’s that are exceptions 
to this general case. In fact, we would like to change A so that the overall 
growth or decline trend is normalized out, and then see how A*® behaves. 
To do so we assume that there is a single number Ap that is the largest 
number (in magnitude) such that A — Ao/J is singular and then study the 


150 6. Leslie’s Population Matrix Model 


behavior of (A/Ao)*. We expect the growth trend to be normalized out, so 
that (A/Xo)* has a limit, and we want to compute this limit. It is important 
to determine or at least estimate Aj, and we return to this estimation later. 

First, let us look for an eigenmatrix P, whose form is preserved but 
whose magnitude is multiplied by \ when P\ is multiplied by A. That is, 
we want to solve the equation 


(6.2) AP, = dP. 


Since we are interested in computing powers of A, we assume that P) can 
be represented as a polynomial in A. Assuming 


Py =poA” + pi A”? +--+ +pn-il, 
then 


AP, = AP, = poA” + pA”? +--+ + pri A 
(6.3) = (poc1 + p1)A"~* + (poce + p2)A"™? + +> 
oF (poCn—1 + pn—-1)A + pocnl : 
Equating the coefficients of corresponding powers of A on each side of (6.3) 
gives 
Pon = Pn-1A, 
PoCn—-1 + Pn-1 = Pn-22, 


poc2 + p2 = pir, 
(6.4) poc1 + pi = por. 


Assuming \ 4 0, these equations can be solved as 


D = Po0en 
n-1 BY > 
_ PoCn-1 Pn-1 = (“S <) 
(6.5)  Pn—2 ot Po| > 2 
— Poc2 | p2 = (= OS oi Sik | 
cS Poy 5a 


The last equation and (6.4) give 


C2 C 
por + pr = polar + 2 +--+ <5) = por. 


This is a valid equation for all values of pg if A” = c,A"~! +---+ en, which 
is true for every \ that is a root of the characteristic polynomial. Of course, 
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if A is not a root of the characteristic polynomial, then this last equation 
implies that po = 0, and hence by (6.5) that p; = 0 for all 7. 

The eigenmatrices P, can also be expressed in terms of the values of the 
polynomials g;(A), defined as 


gi(A) = Xr —-C1,; 
g2(A) = ? = cA —-Q= Agi(A) —C2,; 


gi(A) = Ai — At! =----— Gg = Aga) -— a, 
9n—1(A) = \"t QAP? = ep, 
Gn(A) = A" — AP! = — A — Gy = ch(A), 


which form the sequence of polynomials from Horner’s method (see Sec- 
tion 5.3). So up to a scalar multiplier P) is 


(6.6) Py = A™1* + gy (A)A™? + go(AVA™? + + + gn (AVL, 
which follows by using ch(A) = 0 and rearranging the formulas from (6.5). 

In the simple case in which cha(A) has n distinct roots 4, A2,---,An; 
we will show that it is easy to represent powers of A in terms of the P). 
Assume that there are n scalars a1, Q2,...,@n, such that 
(6.7) f= > a4P,, . 

i=1 

Then 


A = A® Sank = Said Py, = Sak P,, . 
ak i=1 =) 


Further, if A; is a strictly dominant eigenvalue then 
n 


Ak ac" 
lim WF = lim a (%) Py 04 Py, 


k-00 k—-00 ¢ 
— 


We are left with calculating a1. Luckily, this is not difficult. Since AP\, = A; P),, 
then (A — A;J)P), = 0, and for any scalars a, b, 
(A= AQI)(A— Ajl)(a@Pi, + OP) 
_ a(A — AjL)0+ b(A — AiL)0 =0. 


So the matrix polynomial (A—A2I)(A—A3I)---(A—AnJ) annihilates every 
linear combination of P,,,...,Py,,; that is, 


(A— Aol) +--(A- Ant) 5 iPy, = 0. 


1=2 
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Now the matrix polynomial (A— A 2I)---(A—A,J) can be written as ane ) 
because the polynomial in A in the denominator exactly divides the poly- 
nomial in A in the numerator. Next, consider the action of (A — A;J) on 
Py,. We see that 


(A—Aj;I)Pa, = AP, — Ay Pa, = Ar Pa, — AVPx, = (Ar — Az) Pay - 


So 
(A — A,I)(A — AjI)Pa, = (Ar — Xa) (Ar — Az) Pas 
and 
n(A 
as = (A1 — Az) (Ar — Az) (Ar — An) Par 


. ch(A) 
osu 
ch(X) — ch(A1) 


Py, 


~ reve A- Aq Px 
=> ch'(A\1) Py, . 


Actually, in this calculation we made use of no special property of the 
eigenvalue 1, so 


ch(A) ee 
A_ Wake =ch (Ai) Py 


holds for each eigenvalue ),. 
We now want to use this result on the equation 


[= 5 a; P),, 
i=1 


to compute the a;’s. So 


(6.8) SO pg ee 


Py, = AP + gi(Ag)AP™? + + gn=1y)L- 


2 1 is 1, equating the coefficients gives 
ch 


aj; = 1/ch'(A;). In fact, we also have P\, = we which one can easily 


Since the coefficient of A”~! in 


verify is the same matrix polynomial as the one we computed in (6.6). 
These computations demonstrate the following theorem. 


Theorem 6.4.1. Jf A is a companion matrix with n distinct eigenvalues 
and 1 is a strictly dominant eigenvalue, then 


... aA® 1 ch(A) 
or won © where ce el eee 
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A few comments on this theorem are necessary. First, if the eigenvalues 
are distinct, isn’t there only one largest eigenvalue? No, there may still be 
several eigenvalues with the same magnitude. For instance, 


0 1 
afi d 
has cha(A) = A? — 1, and the two eigenvalues, \; = 1 and Az = —1, have 
the same magnitude. In this case, it is easy to see that 


AK I if K is even, 
- | A’ if K is odd, 


and A* /\* has no limit. Taking absolute values of matrix elements does 
not help, because 


K\K KK I if K is even, 
spe l= lake (4 ee eee 
It is possible to get a convergence theorem when absolute values are re- 
placed by matrix norms. Recall that absolute value takes a complex num- 
ber, which may be a negative real number or have a non-zero imaginary 
part, and returns a positive real number, the magnitude of the complex 
number. In a similar fashion, it is possible to define a norm that takes a 
matrix and returns a positive real number, the magnitude of the matrix, 
but we do not pursue matrix norms at this point. Second, why must A, be 
distinct from the other eigenvalues? Because the limit contains 1/ch’(A1), 
and if Ay; were a multiple eigenvalue, ch’(1) would be zero, and the claimed 
limit would not be defined. Also, there would be an inconsistency in (6.8), 
and the scalars in (6.7) could not be found. 

There is a logical hole in what we have just done. We assumed that 
solving for P\ could be carried out by equating the coefficients of AJ for each 
j on either side of the equation AP, = AP. (See equation (6.2).) This is 
tantamount to assuming that the powers of A are linearly independent; 

that is, if 

bp A”) 4 hg A"? 4.02 +b, = 0, 

then b; = bg =--- = by, = 0. This is equivalent to the situation in which the 
minimal polynomial equals the characteristic polynomial, and for general 
matrices this assumption is not valid. For example, if A = J, then 1- A+ 
(—1)-I = 0, and the powers of A are not linearly independent. Fortunately, 
we have an extra assumption about the matrices we are using. We have 
assumed that they are in companion form; that is, 


bane 
A= 
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with c, £0. 

We now show that if A is in companion form, then A”~!, A"~?,...,I 
are linearly independent, which means that the only choice of bj, b2,..., bn 
with 
(6.9) by A”! + bp A”? +--+ + dp =0 


has all b; equal to 0. This equation (6.9) is a matrix equation, and for it 
to hold as a matrix equation it must hold for each column. In particular, 
A’ ey is the last column of A’, and (6.9) implies that 


(6.10) by A"! &, + bg A"? en +--+ byl Cp = 0. 


We claim that A‘ e,, has the form 


0 


that is, the it entry is c, and the last n — 7 entries are 0. This claim is 
Cn 
0 

valid for i = 1, since the last column of A is simply | . |. Assuming that 


0 
the last column has the claimed form for A’, then the last column of A‘t+ 
is given by 


Cy C2 Cn = 

: ; 1 0O 0 —_ _ 
A‘tl en = AA’ Een = Cn = ’ 

0 a 

0 1 olf. 0 

: 0 


and the claim holds for 7 = 1 through 1 = n—1. For i = 0, we want the last 
column of J, which is e,. For (6.10) to be valid, b,, must be 0, since all of 
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the other vectors being added have a 0 as the last component. Hence, (6.10) 
becomes 0; A"! en +--+ + bn—1A en = 0, and similarly we can see that 
bn—1 = 0. Continuing this argument, 6) = bg = --- = b, = 0. So the first 
n powers of a companion matrix are linearly independent, and equating 
coefficients of these powers was a valid operation and there is no logical 
hole in the proof of Theorem 6.4.1. 


6.4.1 Matrices with repeated eigenvalues 


Unfortunately, even companion matrices can fail to have distinct eigenval- 
ues, and since we get only one P, for each A, it may not be possible to 
represent J as a linear combination of the P,’s. But the last theorem sug- 
gests a natural generalization. If A1 is an eigenvalue of multiplicity m, then 
ch(A) is divisible by (A — A; J)”. So instead of having a single P), given 


by ow we can have instead m matrices Pe, ae snes ie with 
pa — _ehA) 
» = Gods’ 


This form has the pleasant consequence that 
(A-AT)PY = PE for j >1, 


and, of course, (A — MD) PO = 0. 
As before, to study the behavior of powers of A we look at an expansion 
of I. To keep the notation simple, we write the P,’s without superscripts, 


and have 
De Weis 
i=1 


which looks like the expansion we used before. The extra complication is 
that the formula for the a,’s is slightly more complex. If A; is a root of 
multiplicity 1, consider 


ch(A)_,_ _ch(A) , CAL 
An dgFl = Am aygh 2 HP = 4G TP = 9H OVP, 


and so aj = On)" since P,, = “oe d. If ; is a root of multiplicity m, 


consider 


ch(A) ___ch(A) | 
(A—xyne = TA rye 2 PA, 


~ (A-AD™ 
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where P“),...,P(™ are the eigenmatrices associated with A;. Of course, 
the eigenmatrices associated with other eigenvectors are annihilated by this 


operator. For 
ch(A) 


4) = ae 


we have 


PO) = 7(A)[ayP™ + agP? +++» +amP™]. 


AP®) = ),P@ +4 PO, A* PR = 2? PA) 42), PO, 


So 

Alp® = \ip@ 41. ,F1 pO) , 
and 

Alpl) = yi pl) 4 (;) \H1pU-D 4 (;) ABU go. 
Hence, 


P\™ = ayr(Ay) PY 
+ ag[r(A\y) PO + r'(A)P] 


+ am|[r(Ag)PO™ + 7’ (APY + 2" (A) PO 4... 
Equating the coefficients of the P’s gives 


l= Amr (AG) ; 
0 = Amr’ (Ai) + Om—1r (Ai) ; 
0 = Amr” (Ai) + Am—19' (Ai) + Om—2r(As) ; 


O= aar™ (Ad) + omar 78g) + + arr Oy). 


Since r(A;) is non-zero, this is an invertible triangular system of m linear 
equations in m unknowns; it has a unique solution which can be found by 
back substitution. We can compute powers of A by 


A* = 3 a;,A* Py, 5 
i=l 
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but we may have that APY? = PY? + PY if there are eigenvalues with 
multiplicity greater than 1. But these “extra terms” can only grow as a 
polynomial in k times Ba Thus, if we assume that A is the eigenvalue of 


greatest magnitude and that A, is not a multiple root, then limp. A. 
1 


does not have any extra terms, because (\;/\1)* times any polynomial in 
k still goes to 0 as k increases. This gives an improved form of Theorem 
6.4.1. 


Theorem 6.4.2. If A is a companion matrix and the dominant eigenvalue 
(the eigenvalue with largest magnitude) 1 is simple, then 


2. a 1 ch(A) 
os = oOay where Pxg = = aa 


and this limiting matrix has rank 1. That is, there is some vector X such 
that the limiting matriz can be written as 


i fh eee Q 


(ewe 


or said another way, every column of the limiting matrix is a scalar multiple 
of any other column of the limiting matria. 


Proof. If Z is the limiting matrix, then it does satisfy the equation AZ = 
A1Z. Each column Z; of Z also satisfies AZ; = A, Z;, and so 


C1 Z5.4 + C2Z4.2 +++ +n Zin =MZi- 5 
Zia =i Zj.2 5 


Zin—1 =M Zin 


where Z;.; means the gh component of the vector Z;. If Z;.n were known, 
then Zj.n—1,---,Z;.1 could be computed. The top equation is redundant 
and merely says that A, satisfies the characteristic polynomial. Thus, up 
to one unknown multiplier Z;.,, this system of equations has a unique 
solution, and the theorem follows. O 


6.5 Nonnegative Companion Matrices 


Although companion matrices capture the structure of Leslie matrices, they 
do not take into account the Leslie assumptions of nonnegativity. In this 
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section we consider the extra properties that follow when all entries in a 
companion matrix are nonnegative. 

The characteristic polynomial of either a Leslie matrix or a nonnegative 
companion matrix can be written in the form 


ch(A) =” — Soa", 
i=l 


where each c; > 0 and c, > 0. As shown in Section 5.1, such nonnegative 
polynomials are of two types: the primitive type, which has gcd{i|c; > 0} = 1, 
and the periodic type, which has g = gcd{i|c; > 0} > 1. 

A primitive polynomial has exactly one positive real root Ao, and this root 
is dominant in the sense that Ag > |A;| for any other root ;. A periodic 
polynomial with period g has non-zero coefficients only at positions that are 
multiples of g, which means that A appears in the polynomial only as powers 
of A%. Therefore, the characteristic polynomial of a periodic nonnegative 
companion matrix can be viewed as a polynomial p(x), where « = \9 and 
p(x) is now a primitive polynomial with a unique positive real root zo. 
The positive root of ch(A) is the positive solution to 49 = xo, and since 
Xo is positive, then \y = xg! 9. This Xo is not strictly dominant, because 
AI = xo has g roots all of the same magnitude, and these equal-magnitude 
roots give rise to oscillations and asymptotic periodic behavior of period 
g. We consider these possibilities in the next subsection. Oscillations are 
also possible in companion matrices that have some negative entries. (Are 
these non-nonnegative?) We do not consider such matrices in detail, but 
in Exercise 6.19 there is an example of a companion matrix that behaves 
periodically but its characteristic polynomial is not periodic. 

Since a nonnegative companion matrix with a primitive characteristic 
polynomial has a dominant eigenvalue, we can prove a stronger version of 
Theorem 6.4.2 for such matrices. 


Theorem 6.5.1. Jf A is a nonnegative companion matrix with a primitive 
characteristic polynomial, then there is a power of A that is strictly positive, 
A has a strictly dominant positive eigenvalue Xo, 


A® 1 ch(A) 
a here Py, =——— 
mole egy ee PE ar 


and this limiting matrix is a strictly positive matrix with rank 1. 


Proof. Nonnegativity implies that Ao is the dominant eigenvalue, and from 
Exercise 5.2 we know that ch'(Ao) > 0. Theorem 6.4.2 implies convergence. 
We are left with showing that the limiting matrix is strictly positive. By 
the division algorithm, ch(A)/(A— Ao) = A"71 +. gi A" 72 ++ ++. gn—1, where 
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the coefficients are 


gm =rA0- €1, 


2 
g2 = A — €1A0 — €2, 


n—-1 n—2 
Jn—1 = XG — CAG Ses SC = Cn/ Xo. 


Clearly, gn—1 > 0, and this implies that g,—2 through g; are also positive. 
(Also, refer to Exercise 5.14.) So Py, = A"~1 + gi A"? +--+ + gn-il, 
which is a positive sum of nonnegative matrices. But can it happen that 
none of these powers of A is strictly positive? If we look at the graph G4 
corresponding to A (see Section 7.3 for more details on the correspondence 
between a nonnegative matrix and its graph), then Aj,, the (7, j)* entry of 
the r‘® power of A, is positive when there is a path from uv; to v; of length 
rin Gy. The graph for a nonnegative companion matrix contains at least 
the edges in the following diagram 


so there is a path between any pair of vertices, and the length of this path 
is between 0 and n — 1. Hence, Ai; is positive for some 0 < r < n-—1, 
which means that the positive sum P), of these powers of A is strictly 
positive. oO 


6.5.1 Periodic nonnegative companion matrices 


Now that we understand the convergence properties of primitive companion 
matrices (that is, companion matrices whose characteristic polynomials are 
primitive), we would like to consider nonnegative companion matrices that 
are periodic in the sense that their characteristic polynomials can be written 
in the form 


n-g n-2 


7 — Cag(A°) 


~9 
g — 


ch(d) = (A%)* — cg(A¥) = Cgn/gs 

and these can be viewed as polynomials in A¥ rather than as polynomials in 
A. We want to reduce the periodic case to the primitive case. Here “reduce” 
means that we would like to represent or think about an imprimitive matrix 
as several primitive matrices, apply the theorems about primitive matrices, 
and then put the results together to obtain an analysis of imprimitive 
matrices. This idea of reduction is a central unifying concept in all of 
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mathematics, and is also the basis for divide-and-conquer algorithms [102] 
in computer science. (See also Section 9.4.) 

Assume that we have a matrix L with g = 2, where this g is, of course, 
the greatest common divisor of the cycle lengths in the graph of the matrix. 
This quantity is also called the index of imprimitivity, but it is simply 
easier to call it g. We can associate a graph G(L) with the matrix 


We’ve chosen to use the convention that if L;,; > 0, there is an edge from 1; 
to v;. This is the opposite direction to that used in the previous diagram. 
There is an edge from v,, to v; and a cycle of length n passing through this 
edge. The assumption that g = 2 implies that n is even. So the graph looks 
like this: 


There may be some other edges from v1, but they all go to vertices in the 
second column, vertices with an even index because g = 2. Now let’s form 
a new graph whose edges are the paths of length 2 in G(L); a part of this 
graph looks like this: 
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Vl — V3 V2 <— U4 


Va % i \ 


Un-1 U5 


7 SZ 


The old edges from v; to v; now appear as edges from v; to uj; and as 
edges from vz to v;, but there are no edges from an even vertex to an 
odd vertex, and there are no edges from an odd vertex to an even vertex. 
Notice that the original connected graph has been split into two graphs with 
no edges between them. Further notice that two new graphs are identical 
(isomorphic). By the usual reasoning about adjacency matrices, L? is the 
matrix that corresponds to this graph with two parts. If we permute the 
columns of L? so that the odd positions appear first and we permute the 
rows in the same way, we obtain a matrix of the form 


dae fi. See Re 
1 0 

E ne where A= 
1 O 


So, pleasantly enough, A is a companion matrix, and in fact a primitive 
companion matrix because using paths of length two divides the cycle 
lengths by 2, and converts g = 2 for L into g = 1 for A. 

At this point we should recall a few facts about permutation matrices. A 
permutation matrix IT has exactly one 1 in each row and each column 
and 0’s elsewhere. If A is a matrix and II is a permutation matrix, then AII 
is a matrix with the columns of A permuted. The position of the 1 in the 
it column of I determines which column of A appears as the 7** column 
of AII. Similarly, IIA is a matrix with the rows permuted. The position of 
the 1 in the i*” row of I determines which row of A appears as the i*” row 
of ILA. Now consider the effect of applying II”, the transpose of II, the 
matrix whose rows are the columns of I. Then II’ ATI is the matrix that 
results from permuting the rows of AII in the same way as the columns of 
A were permuted in AII. This is exactly the kind of operation we want to 
carry out on imprimitive matrices. Finally, consider IIT. The single 1 in 
the i** row of II” matches the single 1 in the i** column of II, and since 
there are no other 1’s in the same row as this 1, the product matrix has in 
its i** row a 1 only at position i. Hence, II7II equals the identity matrix, 
and we conclude that the transpose of a permutation matrix is the inverse 
of the permutation matrix. 
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For our example matrix L, the appropriate II permutes the odd columns 
so that they lie in front of the even columns, and 


A 0 
T p27 
IP ow= a | 
and a 
2 Ol or 
L* =I 3 A I 
Now consider (II? £711) (II? £711) = II? L4H, since 7 TI = IJ. This gives 
AX 9 
T p2Ky] 
nT? L241 = | ; ar| 


Of course, A satisfies the hypotheses of Theorem 6.5.1, so 


_ A* 1 cha(A) 
K-00 py Ch’, (0) A — pol 


where ch,(A) is the characteristic polynomial of A and po is the unique 
positive root of this nonnegative polynomial. Most conveniently, ch 4(A?) = 
chy(A), and replacing \? by \ gives the characteristic polynomial for A. 
This also means that the dominant eigenvalue jo for A is the square of the 
dominant eigenvalue Ao for L. Hence, 


kK 
lim — 0 
: Lek Koo TK 
oon! o ax | O" 
0 0 lim K—00 Kk 


Ho 
1 ha(A 
— nfs A(A) E |} 
chi,(uo) (A-pHol [0 Lf 
where J is the } x > identity matrix. While this formula gives an answer, 
the matrix without the II’s might be more intuitive: 


ay 0 a2 0 tee An 2 0 
0 Qy 0 a2 ee 0 Anp 
2K a, Bo 0 a2 Bo 0 vas An pBg 0 
lim L a 0 a, Bo 0 a2 Bo les 0 AnpBo 
K-00 2h 
0 : : : : : : 
ai Brp 0 a2Bnp 0 “+ OnpBrp 0 
0 ai Brp 0 agBnp +: 0 AnpBnp 


Here we are trying to explicitly display two facts: that the limiting matrix 
for A has rank one and that the permutations II and I” spread out the 
two copies of the limiting matrix. 
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Let us call this limiting matrix DZ. and look at the action of Z on Lo. 
Clearly, either by the derivation or direct calculation 


But what about Loo? This operation results in a matrix that is in a sense 
complementary to DL... That is, the produced matrix, let us call it D,,, has 
0’s where L., has positive entries, and LD, has positive entries where D,, has 
0’s. Unfortunately, the values are not quite so nice. In L.., each column is 
either a multiple of the first column or a multiple of the first column shifted 
down one component. In L,,, the odd-position columns all start with 0 and 
are multiples of the first column of L,,. The even-position columns are all 
multiples of the second column of L,, and this second column begins with a 
positive entry, but it is not necessary that the second column be a multiple 
of a shifted copy of the first column. 

The limiting behavior for general g follows immediately from our example 
with g = 2. 


Theorem 6.5.2. For a nonnegative companion matriz L with period g > 2, 


. Lge 1 cha(A) 
Be R= OR) AE | 
0 I 
where 
fg faq fn 
1 0 a hn 
A= _ |, cha(A) = Ao = fydo7! = ++ = fn, 

1 0 


and II is the permutation matrix that moves the columns with indices 
=1 mod g before those with indices =2 mod g,..., before those with in- 
dices = 0 mod g. Also: 
(a) Lo has rank g. 
(b) Ifi = J (mod g), the i*® column of Lo has positive entries in positions 
j, J+9, 9 +29, ... and zeros elsewhere. 
(c) Every column of Lo is a shifted multiple of the first column. 


i 

Let Lp=(#) Lo. Then: 

(a) Lo, In,...,L-1 are linearly independent. 

(b) Ifit+tr = j (mod g), the i** column of L, has positive entries in 
positions j, J+ 9, j+2g,... and zeros elsewhere. 


(c) (4)'t. a 
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6.6 Back to Leslie Matrices 


Up to this point we have used companion matrices rather than Leslie ma- 
trices because the survival rates in the Leslie matrices complicate things 
slightly. If B is a Leslie matrix, 


fi fo oc fn 
Sy, 0 

B= ; 
Sn—1 0 


then B can be converted to the companion matrix LD via the diagonal matrix 
S by L = SBS, where 


1 
$1 
S= $152 , 
Sy, .--Sn-1 
1 
1/sy 
siz 1/8159 
1/81... 8n—1 
and 
fi Sife $18ofg +++ 81-.-8n—1ifh 
i 0 
L= 1 


1 0 
This claim can be easily verified by direct calculation. Of course, we also 
have B = SLS~! and B® = SL*S~—!. Hence, the asymptotic behavior of 
B can be directly calculated from the asymptotic behavior of the compan- 
ion matrix L. Each entry of B® can be obtained from the corresponding 
entry of L*® by multiplying and dividing by products of the survival rates. 
Specifically, the (7, 7)'® entry is obtained by multiplying by s;--+s;—1 and 


dividing by s;---sj—-1. So the (i, 7)" multiplier is 
Spt * Si] ifi> 7, 
Ss eee Se 
= 41 ifi=j, 
Sy, eee Sj-1 


1/si-++ 85-1 ifi<y. 
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If B is a primitive Leslie matrix, the corresponding companion matrix L 
is also primitive and there is an analog to Theorem 6.5.1. But S and S~+ 
can be factored across powers and limits and then applied to the limiting 
matrix. Since the limiting matrix is a polynomial in the companion matrix, 
applying S and S~! converts this to a polynomial in the Leslie matrix. 
Hence the convergence theorem for primitive Leslie matrices is identical to 
the convergence theorem for primitive companion matrices: 


Theorem 6.6.1. Jf A is a primitive Leslie matrix, then A has a strictly 
dominant positive eigenvalue Xo, 


AX 1 ch(A) 
lite: a here Py, == 
coo way ee Sear 


and this limiting matrix is a strictly positive matriz with rank 1. 


6.6.1 Periodic Leslie matrices 


For periodic Leslie matrices we can obtain a convergence theorem by taking 
Theorem 6.5.2 and multiplying the companion matrices by S and $7. 
In a moment we will restate that theorem, this time for Leslie matrices 
rather than companion matrices. An obvious question to ask is, why did 
we bother to analyze companion matrices rather than looking directly at 
Leslie matrices? The answer is that if we took a Leslie matrix with g > 1, 
took the g** power, and then permuted rows and columns, the resulting 
matrix would become 


A 
Ag 


Ag 


where the matrices A;,...,Ag are Leslie matrices but they are not nec- 
essarily identical. The complication comes from the fact that the survival 
rates can be different. (The matrices A; are similar, but that is somewhat 
messy to check.) By using companion matrices, the corresponding A; are 
identical because their subdiagonals are strings of ones. 


Theorem 6.6.2. For a Leslie matriz L with index of imprimitivity g, 
Tee 1 cha(A) 


li — Se = TT ~ —1 
Kine BR = OB)? ) A— BE a 
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where 
81°°*89-1f 9 81°+* 829-1 f29 wae 81° Sn-iffn 
1 0 
A= , 
1 0 
cha(A) = X9 — 81+++ 8g 1fgAP 1 — +++ — 81++* 8n_1f hn 
cha(Ag) = 0, 
S = diag(1, $1, $182,...,$1°**8n—1), 


and II is the appropriate permutation matrix. Also: 


(a) Lo has rank g. 

(b) If i= J (mod g), the i*® column of Lo has positive entries in posi- 
tions j,j +9,j +2g,.... 

(c) Every column of Lo is a shifted multiple of the first column. 

Let L,= (4) Lo. O<r<g. Then: 

(a) Lo, In,...,L-1 are linearly independent. 

(b) Ifit+r = j (mod g), the i* column of L, has positive entries in 
positions j, J+ 9, j+2g,-°: 


() (4) Lr = Le. 


Figures 6.1 and 6.2 are from the Census Bureau and give the predicted 
number of males and females in the United States for two different years. 
Both of them are projections; that is, they are estimates based on a model 
and data gathered at an earlier date. Figure 6.1 shows the estimate for 2000, 
and Figure 6.2 shows the estimate for 2050. The age classes begin with the 
youngest on the bottom and proceed to the oldest on the top. In this form, 
we might expect the graphs to have a pyramid form rather than the in- 
verted pyramid form expected when the oldest age class is on the bottom. 
Figure 6.1 shows that the U.S. population is not in equilibrium, because 
the “baby boom” is still passing through the population. Figure 6.2 shows a 
prediction that the population in 2050 will have reached a distribution that 
has a rough pyramid shape. We should mention that the U.S. Census Bu- 
reau uses models that are more complicated than the simple Leslie model; 
for example, their models include immigration. In spite of this, an approach 
to a pyramid shape is still predicted. A conclusion like this, which does not 
depend on the details of the mathematical model, is called robust. Sci- 
entists generally have more confidence in robust predictions because they 
know that their model omits many details or variables. 
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(NP-P2) Projected Resident Population of the United States as of July 1, 2000, Middle Series. 


Age 


100 and over 
6 1099 
91094 
85 1089 
1084 
71079 
701074 
6 1069 
01064 
1059 


1054 
4 1049 
41044 
351039 
Dios 
251029 
201024 
151019 
101014 

5109 
Under 5 


50 45 40 35 30 25 20 15 10 05 00 05 10 15 20 25 30 35 40 45 50 


Percent 
Male Female 


‘Source: National Projections Program, Population Division, U.S. Census Bureau, Washington, D.C, 20233 


FIGURE 6.1. Projected U.S. Population in 2000. 


(NP-P4) Projected Resident Population of the United States as of July 1, 2050, Middle Series. 


‘Source: National Projections Program, Poputation Otvision. U.S. Census Bureau, Washington, 0.C. 27233 


FIGURE 6.2. Projected U.S. Population in 2050. 
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6.6.2 Averaging 


As we have just seen, the asymptotic behavior of a periodic Leslie matrix is 
more complicated than the straightforward convergence result for primitive 
matrices. Here we want to look at the “average” behavior of a periodic 
matrix over its period, and we show that a convergence theorem similar to 
that for primitive matrices can be obtained. 


Theorem 6.6.3 (Averaging Theorem). /f L is ann x n Leslie matriz 
(or a nonnegative companion matrix) with period g > 1, then 


io 1 1 ch(L) 
li = —_—__ = ——_—_ P ___oCoCoCC oo 
i300 g 2, NE cho) 2 ch?(Xo) L— Aol’ 


where 
Py, = Lhe» tS L984 (08 eg) EP 94 AR cg A oH. E 
is a strictly positive matrix of rank 1. 


Proof. As we saw in Theorem 5.1.4, L has g roots of maximum modulus; 
namely, A9,wWAg,-..,wI-!Ap, where w is a principal g'® root of unity. For 
each of these simple roots there is an eigenmatrix P; such that LP; = 
Aow! P;. As in our previous arguments, we can expand J in terms of these 
eigenmatrices, which are again linearly independent. Applying (Z/Ao)* and 
taking the limit, all terms except those corresponding to the maximal- 
modulus eigenvalues disappear. So we need only look at 


1 g-1 —1 1 g-1 g-l pas 
— a;P;=-—) a; (=) EB; 
pha) 2 = I F=0 2 doy? 
i g-1 gl g-1 gal 
=- a; Wha >. Py a 
g j=0 i=0 g j=0 i=0 


Now, since w is a principal g‘® root of unity, 
— 0 if 740 (mod g) 
3 Get wed mod g), 
= g if j=0 (mod g), 
and the above sum becomes 
1 
= aoPog = aoPo : 
g 
By the argument leading up to Theorem 6.4.1, ag = 1/ch’(Xo). The ar- 
gument for Theorem 6.4.2 shows that the limiting matrix has rank 1, and 


the argument for Theorem 6.5.1 shows that the limiting matrix is strictly 
positive. oO 
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6.7 The Limiting Effect of LD on Nonnegative 
Vectors 


Now that we know the limiting form of Leslie matrices, we are in a position 
to ask about the longterm predictions made by the Leslie model. Specif- 
ically, suppose one measures a population and represents this population 
by the nonnegative vector Xo. If one also measures the fertility rates and 
survival rates and uses these measurements to construct the Leslie ma- 
trix L, then under the assumption that these rates remain constant, one 
can predict that after k time intervals the population should be L* Xo. Of 
course, it is relatively easy to calculate L* Xo even for very large k by using 
a computer. On the other hand, one hopes that the theory developed in the 
earlier sections of this chapter will lead to predictions that do not require 
much computation. 

Leslie’s Convergence Theorem for nonnegative vectors (Theorem 6.2.1) 
follows immediately from Theorem 6.6.1 on the convergence of primitive 
matrices. Provided Xo is nonnegative with at least one positive component, 
X;, converges to a positive multiple of the stable age distribution. The con- 
vergence is in the absolute error sense if all eigenvalues, except perhaps Xo, 
are less than 1 in absolute value. In general, convergence is only in the 
relative error sense. As long as at least one component of Xo is positive, 
then L* Xo converges to a positive multiple of the unique positive eigen- 
vector that corresponds to Ao. Further, if Ao > 1, the positive eigenvector 
has decreasing components and thus displays the inverted pyramid form. 
Each age class in this eigenvector is less than 1/Ao times the previous age 
class. If Ag = 1, then the components of the eigenvector are nonincreasing, 
but one needs the assumption that the survival rates are less than 1 to get 
decreasing age classes. If Ay < 1, then the form of the eigenvector depends 
on the ratios between the survival rates and Xo. In the “usual” case, one 
assumes Ag > 1 (allowing other eigenvalues to have absolute values greater 
than 1) and concludes that L* Xo converges in a relative error sense to an 
inverted pyramid form. 

Theorem 6.5.1 also shows that there is convergence even if Xo is not non- 
negative, but the convergence may be to a negative or zero multiple of the 
stable age distribution. The coefficient 7 calculated in Leslie’s Convergence 
Theorem is the appropriate multiplier. 

We now want to consider what happens when we have an imprimitive 
Leslie matrix. Are oscillations possible? Do oscillations necessarily occur? 
To see what happens, we apply Lo from Theorem 6.6.2 to Xo. First, S~! 
is applied to Xo, which changes the values by dividing them by positive 
quantities. So a positive component of Xo is not changed into a 0 compo- 
nent, and a 0 component is not changed into a positive component. Next, 
II” acts by gathering components into g subvectors. Each subvector con- 
sists of those components whose indices were congruent modulo g in the 
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original Xo. Because of the decomposed structure of the matrix, each of 
the subvectors is now treated separately. In fact, each of the subvectors is 
now acted on by the matrix 


cha(A) 
ch',(A9)( A — AGL)’ 


which is the limiting form for the primitive Leslie matrix A. So each sub- 
vector that contains a non-zero component is taken to a multiple of the 
unique positive eigenvector of A. Next, the permutation matrix I] undoes 
the permutation caused by II”. Finally, the diagonal matrix S$ multiplies 
each component by the appropriate product of survival rates. 

There are several immediate conclusions obtained from this computation. 
Provided there is noi € {1,...,g} for which X;, Xi+g, Xi+t2g, Xis|a]q are 
all zero, Lo Xo is strictly positive. This means that as few as g components 
of Xo need to be positive to force Lo Xo to be positive. On the other hand, if 
there is such an 2, then Lo Xo will have zeros in the locations whose indices 
are congruent to 7 mod g. 

While it is easy to compute the asymptotic period of a Leslie matrix, 
it is a little more difficult to compute the asymptotic period of the vector 
X, because this vector’s period depends not only on whether a component 
is zero, but also on the actual numerical values of its components. For 


0 0 1 
example, if L = }1 0 0], then L‘'X has period 3 for most vectors X, 
0 1 0 


but L*X has period 1 if all the components of X are the same. For instance, 


3 1 2 3 1 1 
BP(2)=L27(3)=L]/1]=]2], but 2)1) = )1 
1 2 3 1 1 i 


For this example, Xo = (71, 22,73)! leads to a cycle of period 3 iff at least 
one of (1,w,w?)Xo = 21 + wag +w223 and (1,w?,w)Xo = 11 + W229 +wxr3 
is non-zero, where w is a primitive third root of unity. 

More generally, using the partial sums from Horner’s Method, 


Rn—1(A) = AT fxr"? = fos rX" 3 — + = fn—181°°* 81, 
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define the row vectors R,; for 7 = 0,...,g—1 by 


fy = (1,220), baG0) | ba-aG) ) 
: 7 81 7 8182 7” 84+ Bn 
$1 $182 81°**Sn—1 


It is easy to check that R;L = Xow) R; and that R;L9 = d8R;. Now, 
R,L'Xo/A), = (Rj - Xo)w"", a periodic function of t. Let us define per(j) as 
the least p > 0 such that w?? = 1. In particular, if 7 = 0, per(j) = 1 and 
the above function has period 1; that is, it is a constant function. Notice 
also that per(j) < g and per(j) always divides g. Asymptotically, L'Xo/\, 
is a sum of periodic functions, and its period is the least common multiple 
of the periods of the functions in the sum. These considerations lead to the 
following theorem. 


Theorem 6.7.1. If L is a Leslie matriz with period g, then asymptotically 
X1/Xb is a periodic function of t with period \cm{ per(j) | (Rj - Xo) #0}. 
This period is a divisor of g. (Here, per(j) is the least p > 0 such that 
wIP = 1, where w is a principal g*” root of unity.) 


6.7.1 The period of the total population 


Now that we know that X;/\$ becomes periodic, we would like to know 
whether this periodicity is visible if one observes only the total population 
rather than each component of the population vector. Here the total pop- 
ulation is the sum of the components of X;, and can be written as the 
inner product TOTAL = (E£,X,), where E is the vector of all 1’s. (The 
standard inner product of two n-component vectors A and B is the sum 
of the products of the corresponding components; that is, }7i"_, 4; B;. Here 
we use (A, B) to represent the inner product of A and B.) 

Since we are interested in the asymptotic period, let X; = piem ajwi'C;, 
where Xo is the positive real eigenvalue of L, w is a principal g'® root of 
unity, and for A; = Aow’, 


n—1 n—2 n—3 T 
Ci = (Aj LAG 81, Aj BSH ya ty S12? * Syed) 


Note that X; is a periodic function of t, and X;/\} approaches X;. The 
period of X; divides g and depends on which of the a’s are non-zero, 


namely, per(X;) = lem{per(j) | a; 4 0}. Similarly, 
per( TOTAL) = per((E,X;)) = lem{per({) | a;(E, Cy) # 0} 
and 


per(X;) = per(TOTAL) for every Xo — > (E,C;) £0 for all j =0,...,g-1. 
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To see when this will occur, let us define the polynomial e(A) by 
e(A) = Sos sot a a = yr-l + 81X"-? fee et S1°**Sn-1, 
i=1 


so that e(\;) = (£,C;). To show that the inner product (E£,C;) is non- 
zero, we need only show that e(A) does not have A; as a root. Since each 
A; has modulus Xo, if we can show that e(A) has no root of modulus Apo, 
then we can conclude that no 4, is a root of e(A). For this, we use the fairly 
standard trick of multiplying e(\) by A — Ao and find that 


e(A)(A = Xo) =" = (Ao = si)A"—* = (Ao = Sa)aia”* 


ee (Ao —_ Sn—1)$1 .. *Sn—2A = Spor S8n—1A0 > 


which is a primitive nonnegative polynomial when Ag > s; for all i = 
1,...,n — 1. Under this assumption, by Corollary 5.1.5 the polynomial 
p(A) = e(A)(A — Ao) has a unique positive root, which is strictly dominant. 
Since Ap > 0 is a root of p(X), Ao must be this root. Hence, all other roots 
of p(A), which are exactly the roots of e(A), are strictly less than Xo in 
complex modulus. From this we obtain the following theorem. 


Theorem 6.7.2. For a Leslie model the period of total population equals 
the period of the population vector if 


Ao > max{s1,$2,...,8n—1}. 


For Leslie models we assume that all survival rates s; are less than or 
equal to 1, and in most applications the further restriction Ag > 1 applies. 
So, this theorem suggests that we should see the same period of oscilla- 
tion in both the population vector and the total population. Because this 
theorem gives only a sufficient condition for the periods to be equal, they 
may be the same even if the condition of the theorem is not satisfied. The 
following results, which are proved in Cull and Vogt [44], give some other 
sufficient conditions. 


Corollary 6.7.3. Other sufficient conditions for the period of total popu- 
lation to equal the period of the population vector are: 

(a) Xo > max{si,2,---,Sn-1} and ged({j|Ao > sj} U{g}) =1; 

(b) Xo < min{s1, $2,..-,Sn—1}; 

(c) ¥» < min{s1,$2,...,5n-1} and ged({jlAo < sj} U{g}) = 1. 


In actual practice, even the total population is difficult to observe, but a 
weighted total population might be observed. For example, the probability 
of observing an organism might be correlated with its size and age. So older 
age classes might be more heavily represented in a sample than younger 
age classes. Such observations can be modeled by replacing EF with an 
arbitrary nonnegative weight vector W. Arguments similar to the above 


6.8 Afterword 173 


can show that W must satisfy a very restrictive set of equations to give a 
period different from that of the population vector. Hence it is reasonable 
to expect that the asymptotic periods of the population vector and the 
weighted total population are identical. 


6.8 Afterword 


Let us review briefly what we have done in this chapter. We started with the 
Fibonacci model and generalized it to the Leslie model by allowing more 
than two age classes. Two complications arose: the problem of periodicity, 
and the difficulty of handling survival rates. We first showed that a simple 
gcd condition eliminates the periodic case. For survival rates, we switched 
to companion matrices, analyzed their behavior, and then used similarity 
to transform back to Leslie matrices. 

Our major result is the asymptotic convergence of powers of Leslie ma- 
trices. We showed that these limiting forms are simple enough to be written 
as closed-form expressions. For primitive matrices this form is particularly 
simple, and the limiting matrix is one-dimensional. 

Of course, biologists are more interested in the population vectors that 
they observe than the matrices that they infer. The convergence of pop- 
ulation vectors follows from the convergence of powers of the matrices. 
For primitive matrices, the population vector converges to the stable age 
distribution, and under reasonable circumstances this distribution has the 
inverted pyramid form. In general, population vectors will have an oscil- 
lating limit whose period depends on the period of the matrix and on the 
initial population vector. Analyses of the the Leslie model in terms of vec- 
tors rather than matrices can be found in Cull and Vogt [42, 43]. 

It is worthwhile noting that results in this chapter are closely related to 
results in other chapters. In Chapter 5, we discussed nonnegative differ- 
ence equations. The companion matrices of this chapter are the matrices 
that correspond to nonnegative difference equations. Leslie matrices are 
slightly more general, but since they are related to companion matrices by 
a similarity transformation, Leslie matrices are closely related to nonneg- 
ative difference equations. In Chapter 7, we will discuss matrix difference 
equations, and Leslie matrices will turn out to be a form of nonnegative 
matrices. The graph techniques that we used for Leslie matrices will also 
be used to analyze nonnegative matrices. 

The Leslie model has some limitations. It is a linear model. If, as we 
suspect, biology is nonlinear, we should be cautious about predictions from 
linear models. For example, if a population is growing, the Leslie model pre- 
dicts that the population will grow exponentially. Such growth will eventu- 
ally deplete a population’s resources, and so we do not expect exponential 
growth to continue indefinitely. On the other hand, it may be reasonable 
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to think of the Leslie model as a first-order approximation to a nonlinear 
growth model. In this case, the linear model may give good predictions in 
the short term, even if its eventual predictions are nonsense. 

As we mentioned, the original Leslie model forces the matrix to be primi- 
tive, and thus it cannot model periodic populations. In Section 6.7.1, based 
on work of Cull and Vogt [44], we considered whether periodic behavior 
would be visible in population totals. We concluded that under reasonable 
circumstances, periodic behavior would still be seen in population totals. 
Since periodicities are not seen in most populations, there is some ques- 
tion about how relevant periodic Leslie models are to biology. Fortunately, 
the example of the periodic cicadas (17-year locusts) shows that there are 
some periodic populations. In general, for periodic models we showed that 
a periodic limit is expected (refer to Theorem 6.6.2). But we also showed 
in the Averaging Theorem (Theorem 6.6.3) that simple Leslie convergence 
can be obtained by taking a suitable average over a period. 

To apply the Leslie model, one must choose a time unit. For some popu- 
lations, the yearly cycle of the environment gives a natural time unit, but 
for other populations the appropriate unit is not obvious. In Cull [31], we 
looked at this problem and showed that inappropriate time units can lead 
to very bad predictions of population growth or decline. 

Finally, we should mention that this chapter only scratches the surface of 
the myriad uses of matrices in population biology. For further information, 
Caswell’s book [25] is a good place to start. 


6.9 Exercises 


Ex 6.1. Give an inductive proof that the Euclidean Algorithm correctly 
computes the greatest common divisor. You should assume that a and b 
are nonnegative integers. You can use one of these two numbers as your 
induction variable. 


Ex 6.2. Show that for a < n and b < n, the Euclidean Algorithm com- 
putes the gcd in time O(log n). 

Hint: Assume that a is the smaller of the two numbers. Consider the situ- 
ation 


gcd(a, b) = gced(b mod a, a) = gcd(a mod [b mod a], b mod a) 


and show that a mod [b mod a] < $. 


Ex 6.3. Show that the bound in the previous exercise is worst case optimal 
by giving an infinite sequence of pairs of integers such that the Euclidean 
Algorithm uses Q(log b) steps on the pair (a, 6). 

Hint: Think Fibonacci. 
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Ex 6.4. Investigate the use of absolute values with powers of a matrix by 
considering 
— {-1/2 1/2 
le 


and showing which of the following have limits: A*, |A*|, A*/A%, |A*/A‘|, 
|A*|/A¥, A*/|A*|, |A*|/|A¥|. How close is |A®/A§| to its limit? 


Ex 6.5. For the Leslie matrix 


02 0 0 5 
1 0 0 0 0 
L=)|0 1 0 0 O;,7 
00 1 0 0 
000 1 0 


determine the least mo such that L™° >> 0. Compare your mo to the upper 
bound computed from the formula in Theorem 6.1.2. 


Ex 6.6. For the Leslie matrix 


02 5 34 
10 0 00 
L=|0 1/2 0 0 oO|, 
0 0 1/2 0 0 
0 0 0 10 


determine the least mg such that L'° >> 0. Compare your mo to the 
upper bound from the formula in Theorem 6.1.2. How does this differ from 
Exercise 6.5? 


Ex 6.7. For the Leslie matrix 


0 0 0 3 5 
1 0 0 0 0 
L=|0 1/2 0 0 ol, 
0 0 1 0 0 
0 0 0 1/2 0 


determine the least mo such that L™° >> 0. Compare your mo to the upper 
bound given by Theorem 6.1.2. 


Ex 6.8. Show that if L is a Leslie matrix with only f, >0 and fp—1 > 0, 
then the least mo such that L'® > 0 is mp = (n—1)? +1. 


Ex 6.9. Create a Leslie matrix LZ in which f;- fi4; = 0 for all z, but 
gcd{i | f; > 0} = 1. Compute the least power mp such that L'° >> 0. 


Ex 6.10. Show that the assumption that A; is unique is necessary for 
convergence by considering 
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and showing that A; = 1 is a double eigenvalue of A. Further, show that 


Aen gh =; Filgaaelt ih 


and so A* /A* does not have a finite limit. 


Ex 6.11. Let 


be the Fibonacci matrix. 
(a) Compute chp(A). 
(b) Find the eigenvalue of largest magnitude, 1. 
(c) Find Py, = chpe(A)/(F — Ard). 
(d) Use the formula from Theorem 6.4.1 to compute 


(e) Use Ay + Ag = 1 and Ay Ag = —1 to eliminate 2 from your formula. 
(f) The limiting matrix should be 


1 fi 1 | 

volt x 
Is this consistent with what you know about the asymptotic form for 
the Fibonacci numbers? 

(g) Calculate the rank of the limiting matrix. 

(h) Find a form for the matrix that clearly displays the rank. (Notice 
that you can multiply each row of a matrix by a different scalar 
by premultiplying the matrix by a diagonal matrix with the scalar 
multipliers down the diagonal. Similarly, the columns of a matrix can 
be scaled by postmultiplying by a diagonal matrix.) 


Ex 6.12. Consider running the Fibonacci sequence in reverse. Use the 
inverse of the Fibonacci matrix. 
(a) Show that the inverse Fibonacci matrix can be transformed by per- 
muting rows and columns to the companion form 


-1 1 
1 Oj" 
(b) Show that the eigenvalue with largest magnitude is negative. 


(c) Let A be the matrix from (a), and let \; be the eigenvalue of largest 
magnitude. Does 


exist? 
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(d) Compare 


(e) What does (d) tell you about the growth in magnitude of the Fi- 
bonacci numbers with negative indices? 


Ex 6.13. The trace of a square matrix A is defined to be the sum of 
the entries on its main diagonal. If G(A) is the graph associated with a 
nonnegative matrix A, show that the trace of A’ is non-zero iff there is a 
cycle of length & in the graph G(A). 


Ex 6.14. Assume that you have an O(n*) algorithm that computes the 
characteristic polynomial of an n x n matrix A. Assume that A is a non- 
negative matrix and the characteristic polynomial is \” + Cy\"~! + ---+ 
Cn—1A+C). Use the following lemma to construct an O(n°) algorithm that 
determines whether A is primitive. 


Leverier’s Lemma. Let C,...,Cn be the coefficients of the characteristic 
polynomial of the matriz A. For each 1, let S; be the trace of A’. Then, 


1 0 0 0 C1 St 

Sy 2 0 0 C2 So 

So Si 3 : =— 

; iz ie . 0 ; : 
Sn-1 7 S2 Sy n Ch Sn, 


Ex 6.15. Prove the Cayley-Hamilton Theorem for companion matrices. 
That is, show that a companion matrix satisfies its characteristic polyno- 
mial. Further, show that this is the lowest-degree polynomial satisfied by 
the matrix. 


Ex 6.16. Consider the Leslie matrix 


Find the eigenvectors and the stable age distribution. Does every initial 
population converge to the stable age distribution? Consider both absolute 
error and relative error, and show that certain vectors converge in one sense 
but not in the other. 


Ex 6.17. Show by example that there are Leslie matrices L such that the 
entries of the limiting matrix limp Lk / pt are not rational functions of 
the the entries in L. 


Ex 6.18. Many of the results demonstrated for Leslie matrices also hold 
for positive matrices, matrices in which every entry is strictly positive. 
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Consider the following two positive matrices 


6 5 a: 
A=(% a and B=|} Ar 


Show by calculation that for each matrix, nonnegative initial vectors con- 
verge to the positive eigenvector of the matrix. Is this convergence in the 
absolute value or the relative value sense? Does either or both of these 
matrices display the inverted pyramid form expected for Leslie matrices? 


Ex 6.19. Show that the matrix 


1 -1 1 
A=]1 0 0 
0 1 O 


has period 4, but its characteristic polynomial is not periodic. Further, 
show that for most vectors X, A* X has period 4 but there are vectors X 
for which A* X has period 1. Show that there are no vectors X such that 
A* X has period 2. 


Ex 6.20. Prove the results about the period of the total population vector 
given in Corollary 6.7.3. 


Ex 6.21. Consider the Leslie matrix 


ga 0. _ 
L= i | with si fo=1. 
Find conditions on Xg such that L' Xo oscillates with period 2. Show that 


there is a weight vector W such that WL'X 9 always has period 1 even if 
L' Xo oscillates with period 2. 


c 


Matrix Difference Equations 


As we saw in Chapter 6 with the Leslie model, elements of a sequence can 
be vectors instead of the usual real or complex numbers. In this chapter we 
consider linear difference equations with matrix multipliers whose solutions 
are sequences of vectors. Such equations are often called matrix difference 
equations because the equations can be written using matrix and vector 
notation. We look at homogeneous equations, including the special case in 
which the matrix is nonnegative, which has applications to Markov chains. 
We discuss the behavior of primitive matrices and use graph theory to give 
an efficient algorithm to determine whether a matrix is primitive. After 
that, we look at nonhomogeneous matrix difference equations and see how 
to reduce them to one-dimensional difference equations. 


7.1 Homogeneous Matrix Equations 


Let us return to the simplest difference equation, 744; = ax;. When we 
considered such equations before, (7;) was usually a sequence of complex 
numbers and a was a complex constant. It is usual to use capital letters for 
vectors and matrices instead of the lowercase letters we use for scalars. So, 
our first-order homogeneous matrix difference equation is 


(7.1) Xtqi = MX;. 


For this equation to make sense, the sizes of the vectors and matrices must 
agree. Accordingly, the X’s are kx 1 column vectors and M isakxk matrix, 
and (7.1) will be called a k-dimensional matrix difference equation. 
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In printing it is often more convenient to write row vectors, so we also 
use X7 (the transpose of X) as the row vector corresponding to the column 
vector X. To write the components of a vector we use 

X= (23,.@a, oe Op)’ 
You may notice that here subscripts serve double duty, both to indicate a 
component of a vector and to indicate the position of a vector in a sequence. 
When there is a possibility of confusion we will use function notation for 
the sequence, for example, 


X(t) = (x(t), wo(t),..., 2% (t))7 


and 
X(t+1) = (a(t + 1), e+ 1),...,22(¢ +1). 


Written in component form, equation (7.1) becomes 


xi(t = 1) = m4121(t) i al m4222(t) aces mp0 (t) 5 


xa(t bo 1) => m2 21(t) > m22%2(t) Saas Mo~0pK(t) 5 


te(t +1) = mei ei (t) + Meoveo(t) +--+ + MERLE (Et). 


As you can see, in this form there is one equation for each component, 
and the next value for each component can depend on the values of all 
components. (This is in contrast to Chapter 2 and Chapter 6, where a 
companion matrix was always used for the multiplier.) While the compo- 
nent form makes the equations appear complicated, the matrix form makes 
the equation appear simple, and in matrix form the solution is also simple, 


(7.2) X,= MX. 


Unfortunately, this solution doesn’t tell us very much, but it does reduce 
the problem of solving a difference equation to the problem of matrix mul- 
tiplication. 

There are at least two natural ways to compute the solution to (7.2). One 
way is to start with Xo, compute X, by the multiplication M Xo, compute 
X_ by MX, and so forth. This procedure requires about t k? arithmetic 
operations to compute X;, because k? operations are used to compute each 
matrix-vector product. The other technique is to compute M* and then 
compute the one matrix—vector product M* X9. Using the classical matrix 
multiplication method (by this we mean using the definition of matrix 
multiplication directly), computing the product of two matrices uses about 
k? operations. (Refer to Chapter 9 for a discussion of faster algorithms 
for matrix multiplication.) Thus it might seem to take tk? operations to 
compute M‘, but the technique of repeated squaring (also called fast 
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exponentiation) can save many of these operations. For example, M1 can 
be computed by computing M?, then squaring to get M*, then squaring M+ 
to get M®, and squaring M® to get M?°. In general, M*t can be computed 
using at most 2 log t (here log means the base-2 logarithm) matrix products 
by using repeated squaring, and as soon as 2logt is less than t, repeated 
squaring uses fewer arithmetic operations. Further, 2k? logt < k?t for large 
enough t, and for such t, matrix exponentiation by repeated squaring uses 
fewer operations than the matrix—vector product method. 

Another method for computing M‘ comes from the Cayley—Hamilton 
Theorem (refer to Appendix C), which says that the characteristic poly- 
nomial chjy(x) is a polynomial whose degree is k such that chy(M) = 0. 
This implies that there is a polynomial p(a) of degree at most k such that 
p(M) = 0. Such a polynomial with lowest degree is called the minimal 
polynomial for M, and we symbolize this polynomial as mings(a). For 
some matrices (like companion matrices) the characteristic polynomial is 
the minimal polynomial, but for other matrices (like the identity matrices) 
the minimal polynomial may have much lower degree than the character- 
istic polynomial. 

We can use /’s minimal polynomial to compute the powers M”. Setting 
miny (x) = x? — cy*-1 — --- — eg_1x — cg, we have 


M4 =c¢Mt14 .-. +¢¢- 1M + cal, 
and obviously all powers have the form 
Me = QQ MO 4 ae fey a OY peg. 


The key point here is that this formula can be interpreted as the d‘” order 
one-dimensional difference equation 


Bn = C1Xn-1 + +++ + Cd—1%n—(d—1) + CadXn—d 


with initial conditions 9 = I, 21 = M, tg = M?, ..., tq-1 = M*!. As 
in Chapter 2, the solution to this difference equation can be written in the 
form 


d 
M" =~ A; ¢i(n), 
t=1 


where the ¢;(m) are d linearly independent solutions to the recurrence. 
What are the A;’s? In this case, the A;’s must be matrices, and further, 
they must be linear combinations of I, M,..., M¢-!. 

We consider two examples that may help clarify the foregoing argument. 
As in Chapter 6, we can represent the Fibonacci recurrence in matrix form 


(ab ol G2) 
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and treat this as a matrix difference equation. We want to compute powers 


of 
1 1 
weft 3. 
Here it is easy to check that M? = M+J, and so M” = M"™"!4 M"-?. 
Since A? — A — 1 has two distinct roots, A; and A2, we can write the 


solution of this difference equation for M” as 
M” = AyAP + Aod§ 
with initial conditions 
I=M°=A,+4+ Ae, 
M = M'!= 4A); + Addr, 
which can be solved to give 


| 
~ d—Ae 
— -l 

Ae 


A 


(M — oI), 


Ag 


(Me = dad), 


or 
1 


M” = ———[)\"(M — Aol) — AB(M — Ax) 
Maha 


= fnM + fr—il. 
As another example of this method, we use the matrix 
ara 
where here minjy(x) = chys(x) = x? — 4x + 4, which means that we want 
(any) two linearly independent solutions of 
Ln = 40n_1 —40n_2. 
It’s easy to verify that 
zi(n)=n2" and 22(n) = (n—1)2™"' 


are two linearly independent solutions to our difference equation. Next we 
want to find the coefficient matrices A; and Ag such that 


M” = n2"A; + (n —1)2""* Ap. 
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For n = 1, we have M = 2A), which gives A; = M/2. For n = 2, we have 
M? = 4M —4I =2.-27A,4+1-2'Ap =4M 4+ 2Ap, 
which gives Ay = —21. So, 
M* =n2"-'M —(n—1)2"T, 


(-3n+1)2"  9n2r-1 


vo) RN mani (3n + 1)2” 


Another way to write M” comes from the Jordan Canonical Form , 
whose salient properties are stated in the following theorem (also refer to 
Appendix C). 


Theorem 7.1.1. For any complex k x k matriz M, there is a nonsingular 
matriz P such that 


Ji 0 
P-\MP= ; 
0 J; 
where each block J, is bidiagonal of the form 
A 1 OO .. O 
0 AX 1 O ... O 
1 
0 0 0 O 1... X¥ 


where ; is an eigenvalue of M. (In the simplest case each J; is 1 x 1 and 
P-!MP is a diagonal matrix.) Moreover, the n™ power of an rx r Jordan 
block J is the upper triangular matri« 


Ta a em 
(4) ufo ONee as, (Ser 
0 0 a yn 


where the (i,j)** entry is (J")ij = A°-9-9 (.",) for j >i and (J")i; =0 


n 
j-i 
for j <i. (Here C) is the binomial coefficient ae) 
We will not prove this theorem here. Instead, we focus on how this re- 
sult can be used to compute M”. If D is a matrix with Jordan blocks 
Ji, J2,...,J, on its diagonal, then 


Jn 1 


—11 
YQ 
3 
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and we can write down the exact form of D”. Also, P-'MP = D, and 
rearranging this gives M = PDP~'. So to compute M” we compute 
(PDP~')". This is easy because 


(POP = POP PoP a Pie 


and 
(PDP )* = Pup. 
From above we know how to compute D”. 

Notice that P and P~! do not depend on n, and so all dependence on 

n is in the powers of the Jordan blocks. Thus the entries of M” are linear 
combinations of the entries of the powers of the Jordan blocks. In general, 
the Jordan form is difficult to calculate. For most matrices it cannot be 
computed exactly. and the approximate calculation is numerically unstable. 
In spite of these caveats, the Jordan form method for M” is useful because 
it can be computed for many common examples, and because it allows us 
to estimate the growth of M”. 
: 1 has 
two distinct eigenvalues, which means that its Jordan form is a diagonal 
matrix. The columns of the P matrix are the corresponding eigenvalues, 
found by solving the equations 


Pi Pi P Pi2 Pro 
M = d M = xX 


The Fibonacci example is simple because the matrix M = 


__ i MAT ABTT OP-ART fag fn 
deb ARAB OAD AAT Lf fn} 


(Note that we have used 1A2 = —1 to simplify the expression.) It is easy 
to check that this gives the same values as the previous method, although 
the expressions may look a little different. 

For our second matrix, 
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the characteristic polynomial is ch(a) = x? — 4x + 4, which has the double 
root A = 2. We will show that M is similar to the Jordan block matrix 


To show this, we solve the eigenvector equation Mz = 2z to get z = 
(3, 2)". Since every eigenvector is a multiple of zo, we need to find a gen- 
eralized eigenvector y, satisfying My = 2y + zo. This equation has a 
solution y = (1,1)", and so for 


3 1 
r=b | 
we have MP = PJ, and M" = PJ" P~', where P~! = | 


(7.4), we know that 
ye i | 


0 2” 
Multiplying PJ, J", and P~! gives 


yn [(-3n41)2" gna? 
~ | =n2™t+1 — (3n41)2"| ? 


which is consistent with our calculation in (7.3). 

A comment is in order. These two methods rely on different polynomials. 
The first method uses the minimal polynomial, the lowest—degree poly- 
nomial that maps the matrix M to the 0 matrix, while the Jordan method 
uses the characteristic polynomial. The characteristic polynomial for a 
k x k matrix always has degree k, while the minimal polynomial has de- 
gree at most k. (In fact, the minimal polynomial is a divisor—a factor—of 
the characteristic polynomial.) It happens that these two polynomials are 
different when there are at least two Jordan blocks with the same value of 
A. (Refer to [78, Section 7.3].) At this point, a simple example may clarify 
this. Consider the k x k identity matrix J, which is already in Jordan form 
with k Jordan blocks and each block has A = 1, the only eigenvalue. Here 
the characteristic polynomial is (2 — 1)*, but the minimal polynomial is 
min;(#) = x —1, a proper divisor of the characteristic polynomial when 
k>1. 

We are now ready to estimate the growth of a solution to a matrix differ- 
ence equation. For this we use the Jordan form. When J is a Jordan block 
of size r and eigenvalue A, the largest entry of J” is (")\"~", which has 
growth order O(n"|A|"~—"). The Jordan form of a matrix consists of diago- 
nal Jordan blocks, and each block is raised to the n** power independently 
of the other blocks. When D is the Jordan form, D” has an entry that 
grows as O(n"|\;|"—"), where the absolute value of A, is maximum among 
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the absolute values of the eigenvalues and r is the size of the largest Jordan 
block that has an eigenvalue of absolute value |Ai|. From M" = PD" P~?, 
the power M” consists of entries that are linear combinations of the entries 
of D”, and no entry of /” can grow faster than the fastest-growing entry 
in D”. Unfortunately, this argument does not ensure that M” has an entry 
that grows like the largest entry in D” because there may be cancellations 
that occur in the linear combinations. On the other hand, the largest entry 
in D” is also a linear combination of entries of M/”, so by reversing the 
argument we see that at least one of the entries in this sum must be as 
large as the largest entry in D”. This observation gives the following result. 


Lemma 7.1.2. The largest entry in M” obeys max |(M”),;;| = O(n™|Ai |"), 
where r1 is the maximum eigenvalue in absolute value and r; is the size of 
the largest Jordan block that has an eigenvalue of absolute value |\,|. This 
means the (i,j) entry satisfies |(M");;| = O(n™|Ai|”). 


Let us return to solving equation (7.1). We know that the solution is 
X, = M'Xo, and what does this say about the growth of X;? The actual 
growth depends on Xo. In essence, Xo can pick out some or all of the Jordan 
blocks, and so detailed knowledge is necessary to get the actual growth rate. 
But X; cannot grow faster than M‘, and we have the following theorem. 


Theorem 7.1.3. The growth of a solution to a matrix difference equation 
can be bounded from above so that |x;(t)|, max|x;(t)|, and | }> a;x;(t)| are 
all O(n™|A1|"), where r1 is the maximum eigenvalue in absolute value and 
r, is the size of the largest Jordan block that has an eigenvalue of absolute 
value |A;|. 


7.2 Nonnegative Matrix Equations 


For the special type of a nonnegative matrix M, the matrix difference 
equation X14; = MX; is as simple as the Leslie equations of Chapter 6, 
and so they are not much more complicated than scalar equations. These 
special nonnegative matrices have the property that there is a power M* in 
which every entry is strictly positive, which we will write as M*‘ > 0, and 
M is called primitive. The associated matrix equation is relatively simple 
to solve because of the Perron—Frobenius Theorem. 


Theorem 7.2.1 (Perron—Frobenius). If M is a primitive matria, then: 


(a) M has a positive real eigenvalue 9 of maximum modulus; 
(b) Xo ts a simple root of the characteristic polynomial; 
(c) for every other eigenvalue 4, Ao > |A;| (it is strictly dominant); 
(d) min; 7, ™ij3 < Ao <maxjd7,Mij 
min; ae Mig < ro < max; > Mij; 
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(e) the row and column eigenvectors associated with Xo are strictly posi- 
tive; 

(f) the sequence M* is asymptotically one-dimensional, its columns con- 
verge to the column eigenvector associated with \o, and its rows con- 


verge to the row eigenvector associated with Xo; 
(9) Xo = max|M2|/|xr|, where |x| is the Euclidean norm |x| = \/>°2?. 


We will not prove this theorem; proofs appear in many places, including 
(7, Chapter 2] and [146, Chapter 1]. 


7.2.1 Applications to Markov chains 


An important application of nonnegative matrices is Markov chains, which 
are used as models in the biological, physical, and social sciences. The idea 
of a Markov chain is simple: It has a finite set of states with a set of 
probabilities that describe how a state transitions to other states. Often, 
the states are assumed to have certain initial probabilities, and one asks 
how this probability distribution evolves in time and what its asymptotic 
distribution will be. 

We illustrate this with a Markov chain that has three states called 1, 
2, and 3. The probability of being in state z is p;, and the probability of 
transitioning among the states is given by 


1/2 1/3 0 
P(t+1)=|1/2 1/3 1/2] P(t)=M P(t), 
0 1/3 1/2 


where P(t) is the current vector of probabilities and P(t +1) is the next 
vector of probabilities. Notice that each column of the matrix M sums to 
1, and that ensures that the probabilities in each P(t) sum to 1. These 
transitions can also be represented by the following labeled graph: 


We want to find the long-term behavior of this chain, that is, to calculate 
limz+oo M‘'P(0). Since we expect this behavior to be independent of the 
initial distribution P(0), we compute limy.. M’. 
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It is a simple matter to compute the characteristic polynomial for this 
M, chm (A) = 2 - 5)? + 4A+ 34. By the Cayley-Hamilton Theorem, M 
satisfies its characteristic polynomial, so for general t we have 


1 
Mt= 7p (6M —3M*? — mM"), 


a linear homogeneous difference equation with eigenvalues 1, 1/2, —1/6. 
Therefore, any solution M‘* can be expressed as a (matrix) linear combina- 
tion of powers of these roots, 


Mt = Cy, (1)'+Cg (1/2)'+ C3 (-1/6)*, 


where the matrices C,, C2, C3 depend on the initial conditions. Since the 
initial conditions are J, M, and M?, we expect Ci, C2, C3 to be linear 
combinations of J, M, and M?. Writing these out in expanded form, we 
have 

Mt = (c12M? + cM + ero1)(1)* 
ia (c22M? C21 M c29I)(1/2)* 
+ (c32M? + c31M + c3o1)(—1/6)*, 


where the cjj’s are scalars. At first glance it seems we have a problem, 
because there are 9 unknown coefficients but only 3 initial-condition equa- 
tions. However, the initial conditions are matrices, and when J, M, and 
M? are linearly independent, we really have 9 equations in scalars. For 
example, the matrix equation from the first initial condition gives 


I=M°=0-M?+0-M+1-1 
=(c12M? + cM + e191) (1)° 
+ (c22M? + C21 M + C29L)(1/2)° + (c32M? t 31M } c3o1)( 1/6)°, 


where the linear independence of J, M, M? in this example allows us to 
equate coefficients of like powers to get 


O = cy2 + Co2 + 32, O=cy1 + €21 +31, 1 = cio + €20 + €30 - 


Therefore, each equation in matrices becomes, in this example, three equa- 
tions in scalars. It is easy to solve for the c;;’s and obtain 


Mt = =(12M? -—4M —I) 
1 t+1 
(5) (6M? — 5M — I) 


ee (>) asm 27M +91). 


7.3 Graphs and Matrices 189 
Taking limits drives both ($)'t! and (++)! to 0, and so 
. ta y2 
jim M'= 7 (12M —4M-TI), 
which is 


1 
F = 
jim M =7 


mow bw 
mw bw 
mw bw 


Several important results should be noted: 
1. The limiting M* is a matrix of rank 1. 


2. Regardless of the initial probability distribution, the limiting distri- 
bution is (2/7,3/7,2/7)" = (p1,p2,p3)"- 


3. Ao = 1 is a strictly dominant positive eigenvalue of M. 
4. The column eigenvector corresponding to 1 is (2/7,3/7,2/7)7. 


5. The row eigenvector corresponding to 1 is (1, 1,1). 


Since M? > 0, many of these results could have been obtained directly 
from the Perron—Frobenius Theorem. 


7.3 Graphs and Matrices 


Many problems about nonnegative matrices can be solved by translating 
the problem to a problem about graphs. Graphs are objects that consist 
of vertices and edges between some of these vertices. Small graphs can be 
represented by diagrams that can be visually inspected for various prop- 
erties. Often these visual operations can be codified as algorithms, which 
then can be used to determine properties of graphs that are too large to 
be conveniently drawn. In many cases, these graph-based algorithms can 
determine properties of the original matrix more quickly than algorithms 
based on matrix operations. 

More formally, a (directed) graph G = (V, E) consists of a finite set 
V of vertices and an edge set E, where E C V x V, and the (directed) 
edge (v;,v;) goes from vertex v; to vertex vj. The graph associated 
with the nonnegative matrix M of size n x n is G(M) with vertex 
set V = {v1,v2,...,Un} where the directed edge (v;,v;) is in E’ exactly 
when M;,; #0 +. A nonnegative matrix M in which all non-zero entries 
are replaced by 1’s is called the adjacency matrix for G(M). 


1 While (vj, v4) corresponding to M;,; may seem backwards, it is the natural definition 
when matrix M times vector X is computed as MX. 
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EXAMPLES: 
M,= i i ,  G(M,)= 1<—vw 


This correspondence between a nonnegative matrix and a graph gives 
a correspondence between matrix multiplication and paths in the graph, 
where a path from vu; to v; is a sequence of vertices Uj, Vi1, Vi2,.- +5 Vir, Vj 
that starts at v;, ends at v;, and for each vertex in the sequence there is an 
edge to the next vertex in the sequence. The path length is the number 
of edges in the path, which is one less than the number of vertices in the 
path. For the above graphs, in G(M;) there is a path of length 1 from v2 to 
v1. In G(M2) there is a path v1, v1, v2, v2 of length 3 from v; to v2. Notice 
also that G(M;) has no paths of length 2 or greater. Our paths allow a 
vertex or an edge to be repeated even several times. Paths that do not 
have repeated vertices are called simple paths. 

Let us consider matrix multiplication. The row times column rule tells 
us that if C = AB, then 

Cig = s; Girndk;- 
k 


Specializing this formula to the special case in which both A and B are the 
same nonnegative matrix M gives 


Ci = s MikINky- 
k 


When is cj; non-zero? By our assumption that M is nonnegative, each sum- 
mand is the product of two nonnegative numbers and hence is nonnegative. 
So the only way for c;; to be 0 is for every summand to be 0. In our graph 
interpretation this means that there is an edge v; — v; in G(C) exactly 
when there is at least one k such that there is an edge v; — vz and an 
edge vz <— v; in G(M). Said another way, there is an edge v; — v; in G(C) 
exactly when there is a path of length 2 from v, to v; in G(M). 

This observation tells us that the powers of the matrix M contain in- 
formation about the existence or nonexistence of paths in G(M). An even 
stronger relationship can be shown if we assume that WM is a 0-1 matrix, 
a nonnegative matrix in which all positive entries are 1. For a 0-1 matrix 
M,™> ~MikMp; Counts the number of paths of length 2 from v; to v; in 
G(M). This result can be generalized as stated in the following lemma. 
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Lemma 7.3.1. Let M be a 0-1 matrix. Then the (i,7)*" entry of the L*® 
power of M counts the number of paths of length L from v; to v; in the 
associated graph G(M). 


Notice that paths rather than simple paths are counted. For example, for 
the graph 


we have 
0 1 0 0 2 0 
M=1]1 01|, M®=]2 0 2I, 
0 1 0 0 2 0 


and M® counts the paths of length 3. In particular, (M°)2; = 2 says that 
there are two paths of length 3 from v1 to vg. These (nonsimple) paths are 


U1 v2 U1 v2 


and 


U1 > U2 > U3 > U2. 


We can naturally replace each positive entry with a 1 to convert a non- 
negative matrix to the 0-1 matrix that is the adjacency matrix for the 
graph G(M). As in the above lemma, matrix multiplication using natural- 
number arithmetic counts the number of paths. However, in order to study 
the existence or nonexistence of paths between pairs of vertices it is easier 
to use Boolean OR as addition and Boolean AND as multiplication. 
For the remainder of this section our matrices will be Boolean; that. is, 
0-1 matrices with Boolean operations. 

Other matrix properties also correspond to graph properties. A graph is 
strongly connected if for all pairs of vertices u;,v,; there is a path from 
uv; to v;. Since this statement specifies a property for all vertices, it implies 
that there is also a path from v; to u;, but there is no assurance that the 
path from v; to v; has the same length as the path from v; to u;. For 
example, in the strongly connected graph 


Uy — ~ ~ v2 


I { 


U4 <— U3 


there is a path of length 1 from v; to v2, but the length of any path from 
v2 to v1 is at least 3. 

What matrix property corresponds to strong connectedness in the asso- 
ciated graph? We have already seen that the entry (M“);; is positive iff 
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there is a path of length L from v; to v;. So, the existence of a path of 
length at most K between every pair of vertices corresponds to 


kK 
> M" > 0. 
E=0 


(Recall that A >> 0 means that every entry of the matrix A is positive.) 
This formula suggests that an infinite amount of work is involved in using 
matrices to check for strong connectedness. However, we are lucky, because 
whenever there is a path from v; to uj, there is a relatively short path, of 
length at most (n—1), from v; to v;, and our condition can be amended to 


k 
G(M) is strongly connected <=> S- M” > 0, for some k<n—1. 
L=0 


This form suggests a calculation for k = n — 1 using n — 2 matrix mul- 
tiplications and n — 1 matrix additions, but the calculation can actually 
be carried out with fewer matrix operations. For this we need to look at 
the calculation slightly differently. One way to recast the calculation is to 


notice that for any positive integers co, C1,.--,;Cn—1; 
k k 
(7.5) So M*>0 iff So cpM’>0. 
L=0 L=0 


Which cy’s should we choose to make our calculation easier? Because the 
matrices M and J commute, a type of Binomial Theorem holds: 


(M+D* = ‘i (7) MPTP ae 5 (7) ME. 


L=0 L=0 


Therefore, choosing cr = (5) in (7.5) gives 


k 
SoM’ >o iff (M+D*>0, 
L=0 


where (M + J)* can be computed relatively easily by fast exponentiation. 
This calculation can be further simplified by using k = 2” for r = [log,(n— 
1)], and r matrix multiplications (squarings) suffice. Thus, whether a graph 
is strongly connected can be decided in time O(T(n)logn), where T(n) is 
the time to compute the product of two n x n matrices. (We’ll discuss ma- 
trix multiplication algorithms more fully in Chapter 9.) The best currently 
known value for T(n) is O(n), where a = 2.38 (see [29, 28] for details), and 
this matrix method therefore decides strong connectedness in O(n® log n). 
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7.38.1 Next node representation 


A matrix may have many zero entries, which can be considered to corre- 
spond to non-edges, edges that do not exist in the graph. A more compact 
representation might avoid representing these non-edges and represent only 
edges that actually exist. Concentrating on one vertex v, the edges from 
v tell us which vertices can be reached from v in one step. So, we could 
represent a graph by a collection of sets with one set for each vertex v; 
namely, the set that contains the vertices that can be reached in one step 
from v. The adjacency matrix of the graph actually is such a representa- 
tion, because the column corresponding to the vertex w is a bit vector 
representation of the set of vertices that can be reached in one step from 
w. (A bit vector represents a subset of {1,...,n} by a 0-1 vector with n 
components, where j is in the subset iff the j*" bit in the vector is 1.) 
The matrix representation uses n? bits. Is a more compact representation 
possible? Yes, when there are not too many edges. This new representation 
is called the next node representation. For each vertex v there is a list, 
and each item in the list is a vertex that can be reached from v in one 
step. There is a separate next node list for each vertex. Since the vertices 
have labels from {1,...,n}, the name of each vertex can be represented 
in O(log n) bits. Because there is one item in this array of lists for each 
edge in the graph, the whole structure can be represented in O(|E| log 7) 
bits, where || is the number of edges in the graph. When the implied con- 
stants are ignored, the next node representation is smaller than the matrix 
representation, provided || is less than n?/logn. 

We'd like to use this next node representation to determine strong con- 
nectedness more quickly. When a graph is strongly connected, for any vertex 
v there is a path from v to every vertex and also a path from every vertex to 
v. Conversely, if there is a vertex v such that there is a path from v to every 
vertex and also a path from every vertex to v, then the graph is strongly 
connected. The reason for this is that for any pair of vertices, say w and 
z, there is a path from w to v and a path from v to z, and following these 
two paths in order gives a path from w to z. These observations suggest a 
strong connectedness algorithm. Pick an arbitrary vertex v and do a search 
to find all the vertices that can be reached (with a path of any length) 
from v, and then do a “backward” search to find all the vertices that can 
reach v. By “reach” we mean that there is a path in the graph that can be 
followed from the starting vertex to the vertex to be “reached”. 

With the next node representation these searches are easy. One starts at 
any vertex v and puts all of the next nodes of v into a structure. Then one 
takes a vertex, say w, from the structure and puts into the structure all of 
w’s next nodes that have not previously been put into the structure. This 
search halts when the structure is empty. The search is a success if each 
vertex has been put into the structure. It is easy to keep track of which 
vertices have been in the structure by using a bit array. 
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This sort of search can be done in O(|E|) time if we assume that check- 
ing and putting a vertex into the structure are unit operations. For the 
forward search, the next node representation suffices, and we can find the 
next node to place in the structure by following a pointer and then check- 
ing the bit array to determine whether this node has already been put in 
the structure. For the backward search, a previous node representation is 
needed. Of course, this previous node representation is just the next node 
representation for the graph in which every edge v; —> v; has been turned 
around into the edge vj; —— v,;. This can be computed quickly by run- 
ning through the next node list for v say, and putting v on the previous 
node list for each w that is on the next node list of v. At the end of these 
two searches one has to check that all vertices have been found, and a bit 
vector representation for found vertices makes these checks easy. Overall, 
each search costs O(|E]|), and there are two O(n) checks. So, assuming that 
|E| > n, strong connectedness can be determined in O(|F]) time. 

In Section 7.4 we will show that this sort of “graph thinking” leads to 
an efficient algorithm to determine whether a matrix is primitive. 


7.3.2 Comments on imprimitivity 


We are interested in primitive matrices, because by the Perron—Frobenius 
Theorem the difference equations associated with such matrices asymptoti- 
cally have one-dimensional behavior for nonnegative initial conditions. This 
is the same result we found for Leslie matrices in Theorem 6.6.1. We also 
saw that if the characteristic polynomial of a Leslie matrix is not primitive, 
then it is asymptotically periodic with period g = gcd{i| f; > 0}. For exam- 
ple, if A is a 6 x 6 Leslie matrix in which all survival rates equal 1 and whose 
only positive fertility rates are fz and fg, then ch4(x) = x° — fox? — fe = 0 
and g = 2. The graph corresponding to A is 


U3 U4 


U5, —— U6 


and the graph corresponding to A? is 


7.3 Graphs and Matrices 195 


Cit, UQ ———— U4. 


VA, 


These graphs tell us that the population decomposes into two separate 
populations, one consisting of the odd-numbered components and one con- 
sisting of the even-numbered components. If we use the original matrix, 
these two populations change places at each step: the odds become evens 
and the evens become odds. On the other hand, if we use the square of the 
original matrix, the populations remain separate: the odds stay as odds and 
the evens stay as evens. Using the squared matrix corresponds to looking 
at the populations at every two time units instead of looking at them at 
every time step. For our example, squaring the original 6 x 6 matrix and 
taking only the components in odd rows and odd columns gives the 3 x 3 
matrix 


fo 0 fe 
1 O O 
0 1 O 


The same 3x3 matrix is obtained by taking only the even rows and columns. 
These matrices are identical because the survival rates are all 1. If unequal 
survival rates are used, these matrices are 


sif2 O  ssfe sif2 O sife 
8182 0 0 and 8283 0 0 ; 
0 8354 0 0 8485 0 


which look slightly different, but they have the same characteristic polyno- 
mial, x?(s; fo — 2). 

In what follows, we call a matrix strongly connected if its graph is 
strongly connected. In general, if a strongly connected matrix M is not 
primitive, then the matrix M!% decomposes into g primitive matrices, where 
g is the greatest common divisor (gcd) of the cycle lengths in the graph 
G(M). This decomposition is not as regular as the decomposition for Leslie 
matrices. In particular, the number of vertices in each component need not 
be the same, and the decomposition does not have to be periodic across 
the indices of the matrix. For example, the graph 
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3 
1 —— 2 —— 4—— 5 —— 6 —~— 7 


is strongly connected with g = 3. The third power of the corresponding 
matrix decomposes using the three sets of indices {1,5}, {2,6}, {3, 4, 7}. 
The graphs for this decomposition are 


o> ce VV 


and the corresponding decomposed matrices are 


cab a-f 


where the +’s indicate the positive entries. 

Nonnegative matrices can also fail to be primitive because they are not 
strongly connected. For Leslie matrices this is no great problem. If ann xn 
Leslie matrix has f, with k <n for its last positive fertility rate, then one 
can analyze the kx k submatrix with the first k rows and the first k columns. 
If this k x k matrix is primitive, then its powers converge as specified by 
the theorems for Leslie matrices in Chapter 6. The last n — k columns of 
the n x n matrix converge in n— k steps to columns consisting solely of 0’s. 
The first & components in the solution vector X converge as usual for a 
kx k Leslie system. The last n—k components of X converge like the first k 
components, but these last components are also multiplied by appropriate 
products of survival rates. The periodic Leslie case is similar. If g is the 
index of imprimitivity, using the g*® power of the Leslie matrix decomposes 
the system into g Leslie systems with k x k aperiodic submatrices. These 
systems behave as just described. 


+c0o°o 
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The case for general nonnegative matrices is, unfortunately, more compli- 
cated, because there are a myriad of ways in which primitive and periodic 
blocks may be connected. Let us discuss some of these possibilities in terms 
of the graphs associated with the matrix. Of course, this short description 
is insufficient to give more than a flavor of the possible complications. In 
a simple situation, the graph may be disconnected, with a partition of the 
vertex set into several subsets in which there is no path from a vertex in 
one subset to a vertex in another subset. In this situation, the disconnected 
components can be analyzed separately. But a strange thing can happen. 
One of the components could have an oscillation of period p, while another 
could have an oscillation of period g. Viewed together as a single system, 
there is an oscillation of period lem(p,q). This multiplication of periods 
makes it possible for a system to have a period much greater than the sys- 
tem’s dimension. This is very different from strongly connected systems, in 
which the period of any oscillation must divide the dimension. 

Let us now consider the connected case, in which there is a path in at 
least one direction between every pair of vertices. For example, the graph 
Vy —> v2 is connected, since there is a directed path from v, to v2. Of 
course, this example is only connected and not strongly connected. To 
analyze connected graphs, we make use of the equivalence relation that 
specifies that two vertices v; and v; are equivalent iff there is a directed path 
from v; and v; and a directed path from v; and v;. The partition induced 
by this equivalence relation breaks the graph into strongly connected 
blocks, where the vertices in a block and the edges within a block form a 
strongly connected graph. We construct a new graph that has as its vertices 
these strongly connected blocks and that has an edge from block B; to 
block B; iff there is a vertex in B; that has an edge to a vertex in B;. This 
new graph is called a DAG, a directed acyclic graph, because it has no 
directed cycles (every directed cycle is inside a block) and the edges between 
blocks have direction. Two special kinds of blocks need to be singled out: 
the sources and the sinks. A source is a block with no in-coming edges, 
whereas a sink is a block with no out-going edges. In any system, the initial 
conditions determine which blocks are important. A block is active if the 
initial condition for at least one component in the block is positive. (Recall 
that we are dealing only with nonnegative systems with nonnegative initial 
conditions.) A block is eventually active if either the block is initially 
active or there is a directed path to the block from an initially active block. 
Of particular interest are the eventually active sinks. In analogy with fluid 
flow, we expect the sinks to contain all of the asymptotic behavior. If we 
consider the initial conditions to be an initial distribution of fluid in various 
containers corresponding to blocks, we expect the fluid to flow downhill 
and end up in the sinks. Certainly, if a block consists of a single vertex 
with an out-going edge, we expect the fluid to flow out of this chamber. 
Of course, there might be an in-coming flow that would keep fluid in the 
chamber, but we expect that all fluid eventually flows out. When a block 
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has cycles, we expect the fluid to flow around these cycles. But if there 
is an out-going edge, we expect some fluid to flow out, and even the part 
of the fluid that is recirculating should eventually hit the out-going edge 
and flow out. So, at least asymptotically, we expect the eventual behavior 
of the system to be determined by the eventually active sinks. As before, 
this set of sinks behaves as a disconnected system (unless there is only 
one eventually active sink), and we can analyze the behavior as we did for 
disconnected systems. There is one thing wrong with this picture. It is valid 
only in restricted situations since the analogy to fluid flow makes sense only 
if fluid is neither created nor destroyed. If we assume that our system is 
a Markov chain, then the fluid flow analogy does make sense, because the 
total probability always remains equal to 1. Even in the Markov case we 
have passed over some difficulties. The asymptotic probability distribution 
within a sink depends only on the submatrix for the sink if that submatrix 
is primitive, but the probability of being in a particular sink depends on 
the initial conditions over the whole system. Further, when the sink is 
periodic, the maximum period is determined by the sink’s submatrix, but 
the actual period depends on the initial conditions across the whole matrix. 
If the system is not a Markov chain, then blocks can have eigenvalues 
larger than 1, which correspond to the creation of fluid, and some vertices 
may lack out-edges, and self-loops, which correspond to the destruction 
of fluid. The analysis of such systems have to take into account both the 
graphical properties of the matrix and the actual numerical values in the 
matrix. So, we leave such systems with the comment that their analyses 
are complicated. 


7.4 Algorithms for Primitivity 


In this section we investigate algorithms for determining whether a non- 
negative matrix is primitive. 


7.4.1 Algorithm I 


The most straightforward algorithm for primitivity is based on the obser- 
vation that if a power of a nonnegative matrix is positive then all higher 
powers are also positive. Using graph theory, we show that the (n—1)?+1 
power of any primitive matrix is strictly positive. 


Theorem 7.4.1. If A is a primitive n x n matrix and mo is the least 
nonnegative integer m such that A™ > 0, then mo obeys 


mo < (n-2)l+n < (n-1)? +1, 


where | is the length of the shortest cycle in the graph G(A). 
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Proof. If n = 1, then A = [ay] with a1, > 0, and the corresponding 
graph consists of a single vertex with a self-loop. For such a graph, / = 1. 
Since A° = [1], the formulas are correct. We next note that for n > 1, 
1 =n implies that A is a permutation matrix and so cannot be primitive. 
Therefore, we may assume that n > 1 andi <n-1. 

Consider a cycle of length /. Since the graph is strongly connected and 
I <n, there is a vertex v on this cycle that has an edge to a vertex not 
on this cycle. Because A is primitive, for every large enough m there is a 
path of length m between every pair of vertices, and in particular, m can 
be taken in the form il + 1. Let S; be the set of all vertices that can be 
reached from the vertex v in exactly il + 1 steps. Clearly, |So| > 2 and 
So ie Si CaswG Sin: Also, if Sia = Si, then Si+2 = Si41 = Sj. Therefore, 
either |S;| = n or 


2 < |So| < [Si] <--- < |Sj|. 


This implies |S,,-2| = n, and every vertex is reachable from v in (n—2)1+1 
steps. If y is any vertex, strong connectedness implies the existence of an 
in-coming edge from some x to y. Since z is reachable from v in (n—2)1+1 
steps, y is reachable from v in (n — 2)1+ 2 steps. From this argument we 
see that for every j > 1 every y is reachable from v in (n — 2)1 + 7 steps. 
To go from any vertex x to any vertex y, one can go from x to v and then 
go from v to y. But there is a path of length at most n— 7 <n—1 from 
x to v, which means that one can go from z to y inn—j+(n—2)l+ 9 
steps. This gives the first upper bound. The second upper bound follows 
from 1<n-1. O 


So the (theoretically) simple algorithm is to compute A(@~-)**! and 
check to see whether all entries in this computed matrix are positive. The 
analysis of this algorithm is also simple. Matrix multiplications are the most 
time-consuming part, so let M(n) be the time to compute the product of 
two n x n matrices. If we compute the power of A by simple powering as 
in 


then our algorithm uses O(n?) matrix multiplications, and its time com- 
plexity is O(n?M(n)). But most of these multiplications can be avoided by 
fast exponentiation (repeated squaring), in which POWER is multiplied by 
POWER rather than by the original matrix A. 
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POWER : 
Io:=1 
WHILE I <(n—1)?+1 DO 


I:=2xTJ 
POWER := POWER x POWER 


In the last procedure, the power of A is doubled on each execution of the 
WHILE loop. So, if the loop is executed j times, then POWER contains 
A?’ and IJ contains 2’. From the loop condition, we have 27 > (n —1)? +1 
and 2) < 2(n—1)?. So 2log(n—1)+1> j > 2log(n—1), and the repeated 
squaring method has time complexity O(log nM(n)), which is superior to 
O(n?.M(n)). 

At this point we should say more about M(n), the time complexity of 
multiplying two n x n matrices. The classical row-times-column algorithm 
uses O(n?) arithmetic operations (additions and multiplications). Is this 
the best possible? That depends on what kind of entries are in your ma- 
trices and on what kind of operations you are allowed to use. As we saw 
earlier, when we are trying to determine if a power of a nonnegative ma- 
trix is positive, it might be reasonable to change the matrix to a Boolean 
matrix by writing 1 for each positive entry and using Boolean addition and 
multiplication, corresponding to OR and AND. When the operations on 
scalars are restricted to AND and OR it is known that (refer to [112, V. 2, 
pp. 159-168]) any computer program for (Boolean) matrix multiplication 
with no branching has time complexity Q(n*) and the classical method is 
the best possible! (If one allows branching, the Four Russians’ algorithm 
computes the logical product of two matrices in O(n?/ log n) time. See Aho, 
Hopcroft, Ullman [2].) 

This may require a small explanation. So far the operations OR and 
AND, or addition and multiplication, are monotone in the sense that X > 
Y => f(X) > f(Y) where X and Y are vectors. If > is defined on the 
scalars, we can extend the ordering from scalars to vectors by requiring 
that the scalar ordering hold on each component of the vectors. Notice 
that even if the scalar ordering is a total ordering, the vector ordering may 
only be (and usually is) a partial ordering. For example, {0,1} is totally 
ordered by > defined as 


0>0, 1>0, and1>1, 
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and does not specify any relationship between 


Similarly, when the usual ordering > on the reals is extended to vectors, 


we get 
o) > (“:) iff 21 > y, and 72> yo, 
x2 Y2 


and there no relationship for x; > y, and yz > x2. When we discuss the 
usual sort of operations for f, X and Y are vectors with two components 
and f(X) and f(Y) are scalars, so we could rewrite the monotonicity con- 
dition as 


21> y1 and x2 > yo => f(x1,%2) > f(y, y2)- 


For example, 
21> yi and x2 > yo => (x1 OR x2) > (yr OR yo). 


Now the question is: Will using nonmonotonic operations allow matrix 
multiplication to be calculated more quickly? The answer is YES, as shown 
by Strassen [156] in 1969. (We will return to Strassen’s method in Chapter 
9.) He showed that by using the nonmonotonic operation of matrix sub- 
traction he could produce a divide-and-conquer algorithm for matrix mul- 
tiplication that uses only seven half-size multiplications, rather than the 
eight half-size multiplications in a divide-and-conquer algorithm that uses 
only addition and multiplication. Strassen’s algorithm has time complexity 
@(n'°7). A number of researchers have found very clever ways to further 
reduce the complexity of matrix multiplication. At present, the best algo- 
rithm has time complexity O(n), where a is less than 2.4 [29, 28]. These 
faster matrix multiplication methods could be used to produce a faster al- 
gorithm for determining whether a matrix is primitive. Of course, to use 
the methods requiring subtraction, computation must be carried out in the 
integers. While this adds some complexity because the size of the integers 
could increase, it can be shown that the increase is not very significant. 

Are these “faster” methods practical? Since complexity analysis is only 
asymptotic, it is very possible that a “faster” method is slower than stan- 
dard methods for all reasonably sized problems. For Strassen’s method, 
n > 1000 is needed to be competitive with the classical method. For other 
nonmonotonic methods, much larger values of n seem to be needed for 
there to be any speedup over the classical method. On the other hand, 
the Four Russians’ method may be competitive for reasonable values of 
n (say, n © 100) if some of the operations are implemented as bit vector 
operations. 


202 7. Matrix Difference Equations 


7.4.2 Algorithm IT 


After all this talk about matrix multiplication, are there other methods 
for testing primitivity that don’t use matrix multiplication? Yes, there are 
methods that are based on the calculation of graph properties. The basic 
technique used in these methods is depth-first search. ? In this technique, 
one explores all edges in a graph by following each edge leading to an 
unused vertex and backing up when there are no edges to unused vertices. 
The complexity of depth-first search is O(|E|), where || is the number of 
edges in the graph. In our matrix application, |E| is the number of positive 
entries in the matrix, and so O(|F|) is O(n?). 

To determine whether a directed graph is strongly connected, two depth- 
first searches are performed: one on the graph and one on the reversed 
graph. (Recall that in the reversed graph each directed edge is turned 
around so that the tail of the original edge becomes the head of the new 
edge, and the head of the original edge becomes the tail of the new edge.) 
The graph is strongly connected iff both these searches find all the vertices, 
which means that strong connectedness can be tested in O(|E]). 

If M is nonnegative and a power M* is strictly positive, the correspond- 
ing graph has at least one path of length k between every pair of vertices. 
In particular, for every vertex there is a cycle of length k& from that vertex 
back to itself. Further, since M*+! must also be strictly positive, there is 
a cycle of length k + 1 around each vertex. These observations lead to the 
following theorem. 


Theorem 7.4.2. A nonnegative matrix M is primitive iff the correspond- 
ing graph G(M) is strongly connected and has two relatively prime cycle 
lengths. 


Proof. Since k and k +1 are relatively prime, the above observations show 
the only if part. For the if part, if there are two relatively prime cycles, 
every positive integer greater than some integer B can be represented as a 
positive linear combination of the lengths of these two cycles. (See Exercise 
5.5.) Since the graph is strongly connected, there are paths between every 
pair of vertices. We claim that there exists a path of length k = 3(n—1)+B 
(where the graph has n vertices) between any pair of vertices. This will show 
that M* is strictly positive. A path from any v to any w can be constructed 
by going from v to a vertex x; on the first cycle, then around this cycle 
as many times as you like, then from x; to some vertex x2 on the second 
cycle, then around the second cycle as often as you want, and finally from 
x2 to w. The paths from v to 21, x1 to x2, and x2 to w have total length 
at most 3(n — 1). Remember that any number greater than or equal to B 
can be created by sums of the two given relatively prime cycle lengths, and 


2A breadth-first search could be used here in place of the depth-first search. 
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so going around the two cycles the appropriate number of times results in 
a path of length 3(n — 1) + B=k. This proves M* > 0. O 


Perhaps it is easy to misread this theorem. It does not say that there are 
two simple cycles whose lengths are relatively prime. For example, you can 
construct (refer to Exercise 7.2) a strongly connected graph on 20 vertices 
that has only three simple cycles, of lengths 6,15, and 20. Although this 
graph does not have two relatively prime simple cycles, the corresponding 
matrix is still primitive, because you can produce a cycle of length 21 by 
going around the 6-cycle and the 15-cycle. The correct corollary is the 
following: 


Corollary 7.4.3. A nonnegative matrix is primitive iff the corresponding 
graph is strongly connected and the gcd of the lengths of its simple cycles 
is 1. (We call such a graph a primitive graph. ) 


The relatively prime cycle lengths may be relatively large and hard to 
find. Because of this, our strategy is to find some small simple cycles and 
then to check whether the gcd of their lengths is 1. Ideally, we would find 
the lengths of all simple cycles, but this seems computationally difficult. 
In particular, determining whether an n-vertex graph has a simple cycle 
of length n is the famous Hamiltonian circuit problem (refer to [68]), 
which is known to be NP-complete. So, our procedure looks for any short 
cycles, not just simple ones. 

We start by picking an arbitrary vertex v, and then doing a backward 
search and a forward search from v. As seen above, this pair of searches 
checks whether the graph is strongly connected because each vertex appears 
in both the forward search and in the backward search iff the graph is 
strongly connected. 

Once we know that the graph is strongly connected, we need a procedure 
for finding cycle lengths. Consider our search to be breadth-first, so for each 
vertex we can talk of a successor set of vertices and a predecessor set of 
vertices. For each vertex w, let P(w) be the length of the shortest path 
from w to a fixed vertex v. Since there is a path of length 0 from v to v, 
we initialize with P(v) = 0, and P(w) is calculated by doing a backward 
breadth-first search from v, which can be done in O(|£|). For the forward 
search, define Sp = {v} and inductively calculate 


Si+1 = Succ($;) — Already Visited Vertices . 


Of course, here we are using Succ(X) to mean the set of vertices that can 
be reached by a path of length 1 (an edge) from a vertex in the set X. 
We remove the Already Visited Vertices so that edges are not repeated 
and also so that this forward search can be accomplished in O(|E|) time. 
For each w € Succ(S;) put P(w) +%7+ 1 into the set of cycle lengths C. 
We now will argue that the gcd of the final (finite) set C' is the gcd of all 
cycle lengths. For this we use the + closure of a set C' of natural numbers. 
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This closure, denoted by C~, is the smallest set that both contains C and 
is closed under additions and differences. This means that if a © C* and 
b € C*, then both a+ 6 and |a — b| are in C*. 


Lemma 7.4.4. If C is a finite set of natural numbers, then gcd(C) = 
gcd(C*) holds. Also, if g is this common value, then C= = gN; that is, C* 
consists of all nonnegative multiples of g. 


Proof. The elements of C* can be assigned types indicating the least num- 
ber of operations needed to create the element from the original elements 
in C. Elements of C' are type-0 elements of C*, and z is an element of type 
K if ¢ =a+b or x = |a— bl, where one of a and b has type K — 1 and 
the other has type at most K — 1. By hypothesis, g is the gcd of type-0 
elements. Assume that g is the gcd of elements of type at most K — 1. If g 
divides both a and b, then g also divides their sum and their difference, and 
hence g divides all elements of type at most K. Because adding elements 
to a set cannot increase the gcd, g must be the gcd of elements of type 
at most K. Since every element of C* has finite type, then gcd(C*) = g. 
Why is C= = gN? First of all, since g = gcd(C*), all elements of C* are 
multiples of g. On the other hand, by the Euclidean Algorithm we know 
that g € C™, and by closure under addition all positive multiples of g must 
then be in C*. (Closure under subtraction guarantees that Oisin C+.) O 


We will prove that all cycle lengths are in C~™, where C is the output 
of our primitivity algorithm. Then this will allow us to find the greatest 
common divisor of all cycle lengths by calculating gcd(C). 


Lemma 7.4.5. If C is the set of numbers found by the primitivity algo- 
rithm then every cycle length is in C*, and therefore gcd(C) is the greatest 
common divisor of all cycle lengths. 


Proof. Consider any cycle, simple or not simple, and let v1, v2, ..., vz be 
the ordered list of the vertices in this cycle of length L. For every v;, there 
is an edge vj — Vi41 moa L- If P(v;) is the position at which v; is found in 
the backward search, and Q(v;) is the position at which 1; is first found in 
the forward search, then both 


P(v;) + Q(v;) and P(vi41) + Q(v;) +1 


have been put into C, because from the edge v; — v;+1 moa L, the vertex 
v;+1 is found on the step after v;. Since C+ is closed under addition and 
differences, the following natural number must be in C™*: 


L L 


| cP) +O) + - SIP) + eel]. 


i=l i=1 


But this sum is L, the cycle length, and we’ve proved that every cycle 
length must be in C~. O 
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Theorem 7.4.6. If C is the set of numbers found by the primitivity algo- 
rithm, then a strongly connected graph is primitive iff gcd(C) = 1. 


Proof. From the discussion before the last result, it is enough to recall that 
from Theorem 7.4.2 we know that a strongly connected graph is primitive 
iff the gcd of its cycle lengths is 1. Oo 


To complete an analysis of this algorithm we need to look at the com- 
plexity of calculating the gcd of a set of numbers. A naive way to compute 
the gcd is to begin with the two smallest numbers, use the Euclidean Al- 
gorithm to find their gcd, and replace these two elements with their gcd. 
This process can be applied recursively to the remaining set, and there are 
O(n) calls to the Euclidean Algorithm when n is the size of the set. If all 
elements are less than 2n (as they are in our set C), each call to the Eu- 
clidean Algorithm could use O(log n) divisions, and in total this procedure 
is O(nlogn). This is an overestimate. On each division, either the division 
is exact (and the larger number is eliminated) or one of the numbers is 
decreased. In fact, if the division is not exact, then the smaller number 
is at least halved in every two divisions. Since the smallest number has 
only logn bits, it is reduced to 1 once there are 2logn divisions that are 
not exact. This means that the gcd of an n-subset from {1,...,2n} can be 
computed in O(n) time. Coupled with the above this gives the following 
two theorems. 


Theorem 7.4.7. Whether ann vertex graph is primitive can be determined 
in O(|E| +n), where |E| is the number of edges in the graph. 


Theorem 7.4.8. Whether ann xn nonnegative matrix is primitive can be 
determined in O(n?). Except for an initial scan of the matrix, the algorithm 
runs in QO(|E]|), where |E| is the number of non-zero entries in the matriz. 


We finish this section with a couple of comments. It seems that if a graph 
has many edges, then the graph is most likely primitive. For instance, with 
some effort one could calculate a constant y and then show that the graph 
must be primitive if |E| > yn?. It can also be shown [10] that there is a 
constant @ such that if each vertex has at least a log n edges, then the graph 
is almost surely primitive when the edges are assigned at random. Finally, 
our algorithm lets us pick v. Which v should be chosen? A reasonable choice 
is to choose the v with the largest number of edges. In graphs that arise in 
modeling, the structure of the graph may allow a fast proof of primitivity 
without even using our algorithm. The point of all these comments is that 
our complexity estimates may be overestimates. 
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7.5 Matrix Difference Equations with Input 


In the previous sections of this chapter we have considered matrix difference 
equations of the form 
Xt41 = MX. 


These are called homogeneous because multiplying each component of 
X, by a results in each component of X;41 being multiplied by a. The 
solutions to such equations also satisfy the additive property in that if 
Xt41 = MX, and Wi+1 = Mw, then Lt = Xt + WwW, also satisfies Zt+1 = 
MZ,. (Notice that while the recurrence stays the same under addition, the 
initial conditions change, because if Xp = C and Wo = C, then Zp = 
Xo + Wo = 2C # C if C is non-zero.) Homogeneous difference equations 
are used to model systems with no input, where the current state of the 
system depends only on the previous state of the system. In contrast, other 
systems are better described by nonhomogeneous equations, because the 
state of the system depends on an input as well as on the internal workings. 
For example, the states of a sewage system depend on what is fed into the 
system as well as on the internal processing within the system. 

We describe nonhomogeneous systems by a matrix difference equation of 
the form 


(7.6) Xe = MX, 


but in many fields more complicated models are used. For example, control 
engineers often use a pair of equations, 

Xi41 = MX, 4+ BY; 

Zi41 = CX + DY;, 


where X; is the internal state, Y; is the input, and Z;41 is the output. 
They then ask such questions as what input sequence produces a desired 
output sequence. Of course, even more complicated models (including non- 
linear models) are used in a variety of fields. Here we restrict ourselves to 
equations in the form of (7.6). 

Our first result is a trivial observation, which may seem surprising. 


Theorem 7.5.1. [f X14; = MX,4+ ¥;, then the input sequence Y; can be 
chosen so that the solution is any desired sequence. 


Proof. If Xo is the initial condition and Xj is the next desired value, taking 
Yo = —MXo + Xy gives MXo + Yo = MXo = MXo + X1 => Xi. Similarly, 
if X, was the last value and X;+, is the next desired value, taking Y; = 
—-MX,4+ X141 gives X44, the desired value. O 


This result may seem surprising because linear systems are so simple 
that they should have very limited possibilities, and therefore it’s surprising 
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that every desired sequence can be produced from a linear system. On the 
other hand, the result is trivial because it says that if you can produce 
any desired input sequence (any Y;), then you can pass your constructed 
sequence (which contains all the complexity) through a linear system and 
obtain whatever you like. The linear system does not add any complexity 
or flexibility; rather, you had that when you were constructing the input 
sequence. 

A major virtue of linear systems is that we can write down their solution. 
The solution to (7.6) with initial condition Xo is 


t-1 
X,= M*Xo+ > iy, 
i=0 
Notice that this is formally the same as the solution to the first-order 
one-dimensional difference equation 


Lt41 = A%e + Yt, 


but the symbols are now vectors and matrices. This change is significant 
because the order of operands is now important. For example, a k x k 
matrix times a k x 1 vector must be multiplied in the order matrix times 
vector. Of course, this form of solution is purely formal and doesn’t really 
tell us much about how the solution behaves. After we give the solution, 
we must add some hypotheses about the structure of the matrices and the 
growth of the input sequence in order to get some bounds on the behavior 
of the solution. 


7.5.1 Reduction to one dimension 


General multi-dimensional systems are difficult to analyze, but linear sys- 
tems have such nice additive properties that we can hope that multi— 


dimensional linear systems are not much more complicated than one-dimensional 


linear systems. We will show that a multi—dimensional linear system can in 
fact be decomposed into a set of one-dimensional linear systems. We start 
with a special case that often arises in practice. 


Theorem 7.5.2. The k-dimensional linear difference equation 
X41 =MX+Y; 
can be decomposed into the set of one-dimensional difference equations 


#1(¢ +1) = x41 (t) + Hult), 
@o(t +1) = AnFo(t) + Halt), 


&e(t + 1) = AnSu(t) + Jel(E) , 
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if the matriz M has a basis of eigenvectors. Here, the A1,...,Ax are the 
eigenvalues and the &’s and y’s are linear combinations of the components 
of X and Y respectively. 


Proof. If Z is a (column) eigenvector of M, then MZ = \Z, where 4 is the 
corresponding eigenvalue. When / has & linearly independent eigenvectors 


41, 22,°:: ,Z,~ with corresponding eigenvalues A1, A2,--- , Ax, then 
Ar O... 0 
0 A ... O 


M|212Z2-++ Ze] = [Zi Z2--+ Zp] 
O O ... Ax 
(Here [ZZ -+- Z| is the k x k matrix whose i‘* column is the i'® eigenvec- 
tor.) The equation holds because postmultiplying a matrix by a diagonal 
matrix results in multiplying each component of the i*® column by the i** 
entry on the diagonal. Putting this in matrix form, we have 


MZ=ZA, 


where Z is the matrix of eigenvectors and A is the diagonal matrix of eigen- 
values. Since the eigenvectors are linearly independent, Z is an invertible 
matrix and 


Z1MZ =A and M=ZAZ71. 


Replacing M in the matrix equation gives 
Xiy1 = ZAZ 1X +¥;, 
and multiplying by Z~! gives 
(271 Xt41) = (AZ TX) + (277%). 


So setting X = (Z~1X) and Y, = (2~'Y;) (which are linear combinations 
of X and Y) gives . . . 
Xt41 =AX+ Y%. 


Because A is diagonal, each component of this equation is independent of 
the other components, and so we can write the k-dimensional difference 
equation as a set of k one-dimensional difference equations. O 


Once a matrix difference equation has been reduced to its one-dimensional 
form, the methods in earlier chapters can be used to find the solutions to 
these equations. Re-assembling these component solutions into a vector and 
multiplying by the matrix Z gives the solution X; to the matrix difference 
equation. 

This reduction can also be described in terms of the eigenvectors and 
generalized eigenvectors of M7. As we observed in Section 2.4, a matrix 
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also has row eigenvectors, by which we mean a row vector R such that 
RM = AR, where X is the corresponding eigenvalue. Starting with the 
equation 


(7.7) Xtui = MX, + Y, 
we can multiply by a row eigenvector R to obtain 
RXigia = RMX; + RY;. 


Since r(t) = RX; is one-dimensional, we obtain the one-dimensional re- 
currence 
r(#+1) = Ar(t) + RY; 
with solution 4 
r(t) = A r(0) + SOA I RY,. 

j=0 
If M has a basis of eigenvectors, the solution X; to (7.7) can be written 
as a linear combination of the solutions to a set of one-dimensional linear 
equations. In fact, if Vi,..., Vx are the column eigenvectors of M, then 


k 
Xt = ~~ rj(t) V; 
i=1 


where each r;(t) is the solution to one of the one-dimensional equations 
(with associated row eigenvector R;) and the column vector V; is normalized 
so that R;V; = 1. 

This special situation often arises in modeling physical systems because 
the laws of physics often lead to real symmetric matrices and such matrices 
always have a basis of eigenvectors. Recall that such matrices are diago- 
nalizable, which means that they are similar to a diagonal matrix. When 
the matrix is not diagonalizable, the above method fails, but we can fall 
back on using the Jordan Canonical Form for the matrix. (See Section 7.1 
and Appendix C.) Performing the above construction with the Jordan form 
leads to a set of difference equations of the form 


XA 1 O.... O 
O0O A 1... O 
Xi = Te Xi. 
1 
0 0 O N 


This means that every matrix difference equation decomposes into a set 
of matrix difference equations in which the matrices are Jordan blocks. 
Each Jordan block is similar to a companion matrix whose characteristic 
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polynomial is (a — A)", where r is the size of the block (see Exercise 7.13), 
which means that any matrix (even with entries in a general field) is similar 
to a blockwise companion matrix. (We’ve seen this before in Section 2.3.2, 
and the basis in Corollary 2.3.5 records the similarity transformation.) A 
blockwise companion matrix that is similar to M is called a Rational 
Canonical Form for M. The reason we’ve used the article “a” rather 
than “the” here is that Rational Canonical Form is not unique, but rather, a 
matrix can be similar to a variety of rational forms. (See Exercise 7.12.) The 
next theorem uses a divisibility condition on the characteristic polynomials 
of the companion matrices to identify one type of Rational Canonical Form. 
Of course, one can choose other relations on the characteristic polynomials 
to obtain a different Rational Canonical Form. (Refer to [78, Sections 7.1— 
7.2].) 


Theorem 7.5.3 (Rational Canonical Form). A square matriz M is 
similar to a block companion matria 


Cy 
C2 


Cr 


in which the sequence of characteristic polynomials is a divisor sequence, 
which means that each element divides the next element of the sequence. 
Recall that a companion matrix C has the form 


Cl wee Cr 
1 0 
1 
1 O 
with characteristic polynomial chc(x) = a” — cya"! — +++ — cp. 


This is a very powerful theorem, because it applies to matrices with 
entries in any field and because all operations used to convert a matrix 
to Rational Canonical Form are rational operations in the field. This is in 
sharp contrast to the Jordan Canonical Form which requires one to find 
the roots of polynomials. This is difficult to do for many reasons, including 
the fact that some polynomials do not have roots in the field. For instance, 
one may start with a rational or real matrix and have to go to complex 
matrices in order to obtain the Jordan form, and even then, some roots may 
not be expressible by an algebraic formula. Although Rational Canonical 
Form always exists, it may not be easy to find. Algorithms for this problem 
are known (for example, refer to Harrison [75]), but none is straightforward 
enough to be included here. However, knowledge of the existence of Rational 
Canonical Form can be used to obtain the following result. 
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Theorem 7.5.4. Any k-dimensional matrix difference equation can be 
reduced to a one-dimensional difference equation whose order is at most k. 


We do not prove this result in general, but instead give an idea of how 
Rational Canonical Form could be used to prove it. Under the similarity 
that transforms the matrix into its rational decomposition, the original 
difference equation is changed into a set of matrix difference equations in 
which all matrices are in companion form. To see how a companion ma- 
trix difference equation can be decomposed into one-dimensional difference 
equations, let us simply consider a 3 x 3 example, the companion matrix 
for the polynomial 2° — cx? — ca — cz. The component equations for the 
matrix difference equation are: 


ay(t +1) =c121(t) + cora(t) + c3a3(t) + y(t), 


r2(t + 1) =1(t) + yo(t), 


From this, 
x3(t) = a2(t-—1)+y3(t-—1), where 2x2(t— 1) = 21 (t — 2) + yo(t — 2) 


and so 
x3(t) = 21 (t 2) t ya(t 2) t y3(t 1), 


which gives 
xi(t + 1) = c1x1(t) co[ai(t 1) t ya(t 1)] 
+ e3[x1(t — 2) + yo(t — 2) + ys(¢ — 1)] + w(t) 
= 21(t) cour (t 1) t c3a1(t 2) 
cayo(t — 1) + esya(t — 2) + cays(t— 1) + m(2), 


a third-order one-dimensional difference equation. The other components 
are just shifted versions of the sequence (21(t)) with some of the input 
sequence added. For example, 


x3(t) = 21(t — 2) + yo(t — 2) + ys(t—1), 


which is a trivial difference equation in which its output, x3(t), is equal to 
its input because x3 does not appear on the right side of the equation. 


7.5.2 Reduction to homogeneous form 


The matrix difference equation (7.7) can be reduced to homogeneous form 
if the input is well-behaved. Specifically, if there is a homogeneous matrix 
difference equation 7,4; = AZ, and Y; = PZ;, then (7.7) can be rewritten 
as U4, = M %X, where 


vy - (Sent) [MP] (X 
TN Zeur 10 ALN AS 


+ 
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This allows one to write the solution to (7.7) as 
= t (Xo 
Xt = Q M & 5) 


where Q is the k-dimensional projection matrix that returns the first k 
components of its input vector. This reduction replaces a nonhomogeneous 
equation by a homogeneous equation at the cost of increasing the size of 
the matrices in the homogeneous equation. 

Since the new matrix M has a special form, we can hope that computing 
powers of M may be easier than computing powers of a general matrix. If 
the matrices M and A do not have any eigenvalues in common, then an 
easier computation is possible. If V; is an eigenvector or generalized eigen- 
vector of M, then (V;7,07) is a corresponding eigenvector or generalized 
eigenvector of M. Similarly, if U; is an eigenvector or generalized eigenvec- 
tor of A corresponding to the eigenvalue Aj, and if A; is not an eigenvalue 


of M, then there exists a vector V; such that (Vi, , U ) is a corresponding 
eigenvector or generalized eigenvector of M. The assumption that A; is not 
an eigenvalue of M is needed to ensure that (MM — \;I)™ is a nonsingular 
matrix, and hence that the linear equation for V; has a solution. These 
considerations give the following result. 


Theorem 7.5.5. If M and A have no eigenvalues in common then a so- 
lution to 


Xiu. = MX + Y% 


can be written in the form 


k ko 
Xx = s eat EV, + y Bt VhU; : 


i=1 j=l 


7.6 Exercises 
Ex 7.1. Let 


-4 9 
m=|7o al 
Use the two solutions 21(n) = n2"~! and 22(n) = 2” to find an expression 
for M". Compare your result to that found in (7.3). 


Ex 7.2. Construct a graph on 20 vertices that has only three simple cycles, 
of lengths 6, 15, and 20. Find the least & such that there is a path of length 
k between every pair of vertices. Construct the corresponding matrix M 
and show that k is the least positive exponent such that M* > 0. 
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Ex 7.3. Let A be a 10 x 10 Leslie matrix in which fg is the last posi- 
tive fertility rate. Assume that the initial population is 71 = v2 = 73 = 
La @5 x6 0, x7 100, zg = 100, 9 = 90, r19 = 10. Find the 
asymptotic population vector for X441 = AX. 


0 0 2 1 
Ex 7.4. Let A= }1 0 O| and Xo = [0]. Calculate the asymptotic 
0 1 0 0 


population vector for X;4,; = AX;,. How does this calculation reflect the 
remarks about imprimitive matrices? 


Ex 7.5. Let g be the gcd of the cycle lengths for a nonnegative strongly 
connected matrix. Is M¥ primitive? Does M9 decompose into a set of prim- 
itive submatrices? 


Ex 7.6. Given the following graph, determine the gcd g of the cycle lengths, 
and find the decomposition of the graph into g disconnected subgraphs. 


U2 — U3 


t | 


Ul < U4 — U5 — UG 


| ! 


Ug <— Us <— U7 


0 O 0 
Ex 7.7. Let Xi41 = MX; with M = |1/3 O 1). Find the graph for 
2/3 1 0 


this system. Find the DAG for this graph. Is the system a Markov chain? 
Is it periodic? Find the solution for Xo = (a1(0),x2(0),23(0))7 and give 
initial conditions that lead to an asymptotic fixed point. 


0 0 + 0 
Ex 7.8. Let X:11 = MX; with M = |" _ oe Draw th h 
xX F.0. e t+1 = t Wl = 0 4 0 rab Taw e grap 
+ 0 0 0 


and DAG for this system. Find the period g of the system and show how 
M* decomposes into subsystems. 


0 2 0 
Ex 7.9. Let X14, = MX; with M = {1 O Oj}. Find the graph and 
101 


DAG for this system. Let Xo = (1(0), v2(0),73(0))? and find 
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Compare this to the predicted asymptotic behavior based on a graphical 
analysis. 


01 1 0 0 
10 0 0 0 
Ex 7.10. Let Xi4, = MX; where M = |0 0 O O 1]. Draw the 
00 9 0 0 
00 0 1 0 


graph and DAG for this system. Is this a Markov chain? Predict the asymp- 
totic behavior by a graphical analysis. Use a computer to follow the evo- 
lution of X; starting from various nonnegative initial conditions. Do the 
computed results for X; agree with your predictions? 


Ex 7.11. A graph is in Leslie normal form iff it has n vertices v1, v2,..-,Un 
and for all 7, vu; > Ui+1 mod n and all other edges have the form v; — v1; that 
is, all other edges go to v;. (Multiple edges are not allowed, but a self-loop 
v1 — v1 is allowed.) Two strongly connected graphs are g-equivalent iff 
the gcd of the set of cycle lengths is the same for each graph. Two strongly 
connected graphs are c-equivalent iff the set of cycle lengths for the two 
graphs is the same. 

(a) Is every strongly connected graph cequivalent to a graph in Leslie 
normal form? 

(b) Is every strongly connected graph g-equivalent to a graph in Leslie 
normal form? 

(c) If every strongly connected graph is equivalent to a graph in Leslie 
normal form, is the Leslie graph unique? If not, how many Leslie 
graphs are equivalent to a given strongly connected graph? 

(d) Is there a minimum Leslie graph equivalent to a given strongly con- 
nected graph? 

(e) If there is such a minimum Leslie graph, does it have a special form? 


Ex 7.12. Show that a matrix may be similar to more than one blockwise 
companion matrix. 


Hint: Consider the effect of various vectors on the matrix i . 


Ex 7.13. Show that a kx k Jordan block is similar to a companion matrix 
whose characteristic polynomial is (x — A)* where ) is the eigenvalue of the 
Jordan block. 

Hint: Consider the effect of this companion matrix on the eigenvectors and 
generalized eigenvectors. 


Ex 7.14. When \; is a simple eigenvalue of the matrix M, show that the 
solution to 
Xtqy1 = MX¢t 


can be written as 
Xt = arXiC + Y; > 
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where C is the column eigenvector for ;, R is the row eigenvector for ,, 
a = (RXo)/(RC), and Y; is a non-zero vector such that RY; = 0. (Compare 
this with Theorem 2.4.2.) 


s. <3) =19 
Ex 7.15. Let Xi41= MX; with M=|1 0 0 |. Find the eigen- 
ae og 


values of the matrix and use them to predict the asymptotic behavior of 
solutions to the difference equation. Find the solution for the initial value 
vector (2,0,0)7. Compare this solution to the solution with initial value 
vector (1,1,1)7. Find the companion form and the Jordan form for the 
matrix and use them to explain the differences between the two solutions. 


Ex 7.16. Write the following coupled pair of difference equations as a 
matrix difference equation 


Sn = Qtn—1 + Sn-2, th = —Sn-1 + tn-2- 


(What size matrix do you need?) Show that all solutions to these equations 
have periods that divide 8. Solve the initial value problem with initial values 


SO 1, 51 0, to 0, ty 1. 


Compare your solution to the solution found in Exercise 4.20. 


ds, a 1 
1 -1 1 -1 
Ex 7.17. Let X14; = HX; where H = i ee ee Show that H 
1 -1 -1 1 
has only two eigenvalues but four linearly independent eigenvectors. Show 
that the solution obeys 
2' Xo if t even, 
X,= 
2'-1X, if t odd, 
where X; = HX . Give some initial conditions such that X, is very differ- 
ent from Xo. 


5 3 2 
Ex 7.18. Let X14; = MX; with M = |3 2 1). Find the eigenvalues 
2 11 


and eigenvectors of the matrix M and use them to predict the asymptotic 
behavior of solutions to the difference equation. Show that if Xp >> 0, then 
X, = O(AG), where Ao is the unique positive eigenvalue of M. More strongly, 
show that if Xo > 0, then limy..(X; — Aj F1) = 0, where ME, = Ao}. 
Show that for all Xo, limt+o0(Xt — A} E1) = 0, where ME, = \oF1, but 
for some Xo’s one must take EF; = 0. 
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2 -l1 0 

Ex 7.19. Let X44, = MX; with M= }-1 2 = -1). Find conditions 
0 -l 2 

under which X; = @((2 + V2)') and conditions under which X; = @(2°). 


Ex 7.20. Show that exactly one of the following two statements holds for 
a strongly connected graph on n vertices: 
(a) The graph is primitive and there is a vertex v such that the lengths 
of all cycles of length < n that contain v have gcd = 1. 
(b) The graph is not primitive and for all vertices v the lengths of all 
cycles of length < 2n — 1 that contain v have gcd > 1. 
Use this result to create an algorithm that takes a strongly connected graph 
as input and decides whether the graph is primitive. Show that your algo- 
rithm has complexity O(n|E|) for a graph with n vertices and |E| edges. 


Ex 7.21. Let X14, = MX, + Y; with Yi:,; = MY;. If M has a basis of 
eigenvectors Vi,..., Vz, show that every solution X; can be written as 


k 
Xt = Silat t+ BV iy 
i=l 


where the a;’s depend only on the initial conditions for Y;. 


8 


Modular Recurrences 


In this chapter we consider recurrences modulo a fixed positive integer. 
For any positive integer m > 2, the output of the operation of reducing 
an integer modulo m (which we usually refer to as mod m) is the remain- 
der after division by m, where the remainder is chosen to lie in the set 
{0,1,...,m— 1}. For instance, computing a few terms of the Fibonacci 
sequence mod 6 gives 


(8.1) 1,1,2,3,5,2,1,3,4,1,5,0,5,..., 
where each term is the “mod 6 sum” of the two previous terms. We are 
interested in answers to the following types of questions: 

e Is this sequence periodic or eventually periodic mod 6? 

e What is its period? 


e What is the largest period of a sequence that satisfies the Fibonacci 
recurrence mod 6? 


e How many different sequences satisfy this recurrence mod 6? 


The first question has a fairly quick answer, which we give in the next 
section. If we have only basic properties of modular arithmetic at our dis- 
posal, answers to the other questions can be quite complicated. Instead of 
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using only these basic properties, we give a more sophisticated yet accessible 
point of view, which will shed light on the general structure of recurrences 
mod m and provide straightforward answers to these questions. We end the 
chapter with applications of modular recurrences to pseudorandom number 
generation and to factorization of integers. 


8.1 Periodicity 


There’s no apparent pattern in the first few terms of the Fibonacci numbers 
mod 6 as listed in (8.1). Continuing further into the sequence, 


(G2) 1)4,9;3,5,2;1;9,4,1, 5,005,5,4,3,1,4,5,9, 9, 3: 1,0) 1,19, 3,28, 


we see that the initial pair 1, 1 recurs. Because it is obtained from a second— 
order recurrence, the sequence will repeat once the initial pair occurs again. 
Even without listing any elements of the sequence, we know that because 
there are only 36 different pairs of elements mod 6, at least one repetition 
of a pair must occur within the first 37 terms of the sequence. A rewording 
of this is helpful. Let X be the set of all ordered pairs of integers mod 6, 
and let f be the Fibonacci function mod 6 defined on X by 


(8.3) f(a,y) = (y,x +y mod 6), 


where the second coordinate is specified to be the least nonnegative re- 
mainder mod 6, and so f(#,y) is a function on X. Since there are only 
6? different pairs of integers mod 6, any list of 37 consecutive pairs in a 
sequence defined by the second-order recurrence mod 6 must contain a re- 
peated pair. This argument holds for any second-order recurrence mod 6 
with any pair of initial values. 

Putting this in a general context: For a function f defined on a set X, 
we will use f(”) to denote the n‘” iterate of f, 


n times 
————— 
fM = fofofo--of, 


and the orbit of x € X is the sequence 


a, f(x), f(x), f(a), f(a),..., 


the sequence formed by starting with the value x and applying f again and 
again. If there is some n > 0 such that f(")(x) = 2, the orbit of z is called a 
periodic orbit, x is called a periodic point, and its period is the least 
positive t such that f(x) = x. When t = 1 holds, «x is called a fixed 
point. From (8.2) we see that (1,1) is a periodic point of the Fibonacci 
function mod 6, and its period is 24. 
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The orbit of a periodic point with period t can be visualized as forming 
a closed loop that returns to its starting point after t iterations of the 
function f. Sometimes an orbit might get into a loop without returning 
to its initial value. (Refer to Figure 8.1.) We will call 2 an eventually 


f(x) f(x) f (2)(x) 


WV §(2)ix) A Eventually Periodic YY 4 (3) x) 
| 1 ‘ 
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FIGURE 8.1. Periodic and Eventually Periodic Orbits. 


periodic point if there exists an integer t such that f("t9(2) = f(a) 
holds for all sufficiently large n, and the smallest such t is its period. Notice 
that under this definition every periodic point is also eventually periodic. 

For example, for the function f(z) = —2? + 22+ 1 on the set X = Z 
the orbit of « = 0 is the sequence 0,1,2,1,2,1,2,..., and x = O is an 
eventually periodic point of f that is not periodic, and both « = 1,2 are 
periodic points of f. For each of these three values of x the period is two. 


Theorem 8.1.1. [f f is a function on a finite set X, then every element 
of X is an eventually periodic point of f, and its period is at most the 
number of elements in X. If f is a one-to-one function on any (not nec- 
essarily finite) set, then every eventually periodic point of f is periodic. In 
particular, if f is a one-to-one function on a finite set, then every element 
of X is a periodic point of f. 


Proof. If X has n elements, the set {x, f(x),..., f°(ax)} contains n + 1 
elements from X and so must contain a duplicate, say f(x) = f(z) for 
some 0 <i<jg <n. Then 


FO (a) = FFO()) = FFP (@)) = FEY (2), 


f°") (a) = FOF) (x) for all k > 0. 
Therefore, for m > 1, 
fOr+G-) (g) = fIF™—D) (g) = fot) (x) = f(a), 


and we see that x is an eventually periodic point whose period is at most 
jr-ticn. 
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Suppose f is one-to-one and the period of x € X is t. Then there exists 
a minimal N such that f"+9(2) = f(z) for alln > N. To show that x 
is periodic, we prove that N = 0. If N were non-zero, then we would have 


FG? Saas eae? ej = fier): 


and the fact that f is one-to-one would give f-)) (2) = f(N-9 (2). This 
is a contradiction to the assumed minimality of N, and N = 0 must be 
true. O 


For general integer m > 2, the Fibonacci function mod m (also called a 
modular Fibonacci function) is 


F(x,y) = (y,c+y mod m). 


Since this is a one-to-one function on the ordered pairs of integers mod 
m, and the set of integers mod m is finite, the previous result gives the 
following corollary. 


Corollary 8.1.2. Each modular Fibonacci function has a periodic orbit 
for every pair of initial values. 


Let’s compute the period of the Fibonacci sequence mod m for four more 
values of m: 


m=2: 1,1,0,1,1,..., whose period is 3; 

m=3: 1,1,2,0,2,2,1,0,1,1,..., whose period is 8; 
m=4: 1,1,2,3,1,0,1,1,..., whose period is 6; 
m=12: 1,1,2,3,5,8,1,9,10,7,5,0,5,5,10,3,1, 4,5, 


9,2,11,1,0,1,1,..., whose period is 24. 


We have already computed the period mod 6 to be 24, which is the product 
of the periods mod 2 and mod 3. Because of this, you might conjecture that 
this holds in general, but the period mod 12 is 24 which does not equal 
8-6. After some deliberation you can see that 24 does equal lem(8, 6), the 
least common multiple (lem) of 8 and 6. Why the lcm occurs is explained 
in the next theorem. 


Theorem 8.1.3. Let X = Z" for some k and let f be any function defined 
on X. For x € X and positive integers my ,mz2, let t; be the period of x 
under f modulo m, for each of i= 1,2. Then lem(t1, t2) is the period of x 
under f modulo lem(m1, m2). 


Proof. Let m = lem(m1,mz2) and L = Iem(ty,t2). If t is the period of x 
under f mod m, there exists N such that for alln > N, 


fet (@) = f (2) (mod m), 
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and since each of m ,, mg divides m, then 
f°) (x) = f™ (x) (modm;) foralln > N. 


Therefore, t must be a common multiple of t; and te, and L divides t. 

On the other hand, for each of 7 = 1,2, there exists N; such that 
f(t) (x) = f(x) (modm,) for all n > Nj. Using N = max{N;, No} 
and the fact that DZ is a multiple of both t; and to, we have that for all 
n>QN, 

f%*) (2) = f™ (x) (mod m). 


This means that the difference f("+”)(x) — f() (x) is divisible by each m; 
and so also by their least common multiple m, giving 


f+) (2) = f(z) (mod m), 


and x has a period L that is a multiple of ¢. Since we’ve already shown 
that L divides t (and they’re both positive), then L = t. Oo 


How do we find the period of the Fibonacci function for a general mod- 
ulus? If we happened to know (or could easily find) a factorization of the 
modulus into a product of two relatively prime integers m1, mz (that is, 
gcd(m1,mz2) = 1), then the theorem could be applied to find the period 
mod m. That’s exactly what we noted for m = 12. The following result 
follows directly from this observation. 


Corollary 8.1.4. [fm = p{'---p%s is the prime factorization of m and t; 
is the period of x under f modulo p;*, then the period of x under f modulo 
m is lem(ti,...,ts). 


8.1.1 Periodicity of linear modular recurrences 


We’ve been working with functions defined on Z’ that are then reduced 
mod m, but we could equally well have considered f to be defined on the 
set X = Z*,. (Here Z, means the set {0,1,...,m™—1} under the operations 
of addition and multiplication mod m.) 

As we saw with the Fibonacci recurrence, a k'" order linear recurrence 
(8.4) 

Sjtk = Ci8jte—-1 + +++ + CeS; + Ce41 (mod m), where cy # 0(modm), 


defines the function 
(8.5) S(a1,...,~) = (Wa, -.-, Bey CeU1 +++ + c1x~R + CK41 mod m) 


on Z*,, where the last component is chosen to be the least nonnegative 
value of its congruence class mod m. (Notice that we are allowing nonho- 


mogeneous equations with the constant forcing term cp+1.) 
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Let’s talk a bit more about notation. Normally, when 26 (mod 25) is 
written we think of 1, but 26 (mod 25) is of course an infinite set of inte- 
gers, all integers that are congruent to 26 modulo 25. We’ve been using the 
notation t mod m (as contrasted with t(mod m)) to denote the least non- 
negative integer that is congruent to t modulo m. For example, 3-9 mod 5 
equals 2. Another way of saying this is that 3-9 equals 2 in Zs. 

The following lemma follows from (8.5) because 


S(a1,..-,0~) = S(y1,---, Ye) <> 2 = Y2,--- Lk = Yes ChU1 = CeY1- 


Lemma 8.1.5. S is a one-to-one function on ZE, iff cy, has a multiplicative 
inverse modulo m. 


Orbits of S correspond to choices of initial values sg,...,$,—1 in the re- 
currence (8.4). Since 9 is a function defined on a finite set with m* elements, 
from this lemma and Theorem 8.1.1 we obtain the following result. 


Theorem 8.1.6. Each (s0,...,8%—1) is an eventually periodic point of S. 
Moreover, if cy has a multiplicative inverse modulo m, then every orbit 
under S' is periodic. 


Because every element of Z,,, has an additive inverse, an element that has a 
multiplicative inverse can be referred to as an invertible element without 
confusion as to which operation is meant. For the Fibonacci sequence, cz, = 
1 is of course invertible for every modulus. This means that (regardless 
of the initial pair) every modular sequence generated by the Fibonacci 
recurrence is periodic. 

It can be checked that the orbit of (1,5) under the recurrence s;+2 = 
$341 +38, (mod 18) is 


Bie Bs Be tl, Bs Thien 


an eventually periodic sequence that is not periodic. Note that this is con- 
sistent with the last result, since 3 is not invertible mod 18 and S(5,8) = 
(8,5) = $(11,8) shows that S' is not one-to-one. 

The periodicity of modular recurrences therefore depends only on the 
algebraic property of the invertibility of cz. We next describe an efficient 
procedure for determining whether an element is invertible, which then 
becomes a method for deciding whether a specific modular linear recurrence 
is periodic. The procedure is the famous Euclidean Algorithm, which 
we’ve already seen several times.! We used this algorithm in Chapter 6 
(and for polynomials in Chapter 4) to compute the greatest common divisor 
(gcd) of n-bit numbers in O(n) operations. Because we’re interested in 


1The Euclidean Algorithm can be found in the seventh volume of Euclid’s Elements. 
A translation of this can be found at 
alephO.clarku.edu/~djoyce/java/elements/bookVII/propVII2.html. 
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using the algorithm to compute more than the gcd, it’s helpful to review 
the process here. Recall that the gcd of two non-zero integers a,m (which 
we denote by gcd(a,m)) is a positive integer that divides both a and m 
and is defined by the property that gcd(a,m) is divisible by all divisors of 
both a and m. 


Theorem 8.1.7 (Euclidean Algorithm). Let m be any positive integer 
and a be any non-zero integer. Then the sequence of divisions 


m=na+ry, @=qritre; 1 = gQ3T2+73; 


(where0 < 11 <a and0 < riz <1; for alli > 1) ends after finitely many 
steps, and the last non-zero remainder rx is gcd(a,m). 


Proof. Since the sequence of remainders (r;) is a strictly decreasing se- 
quence of nonnegative integers, it must be a finite sequence, and its last 
remainder is zero, 
mM=Na+; @= gr +12;.--37K-2 = KTK-1 + TK; TK-1 = (K+1TK- 
From the definition of gcd, gcd(a, m) = gcd(m-— ka, a) holds for all integers 
k, and we successively obtain 

ged(a,m) = ged(m — qia, a) = ged(r1, a) = ged(r1,a — qari) =--- 


= gced(rK-1,7K-2 — @rK-1) = gcd(qx4irK,TK) =TK.- 


For instance, to compute gcd(437, 12): 
437 = 36-1245, 12=2-542, 5=2-241, 2=2-140, 


and gcd(437, 12) = 1 is obtained. Whenever gcd(a,m) = 1 holds, the steps 
of the Euclidean Algorithm can be reversed to obtain a~t(modm). For 
instance, successively solving backwards for each remainder in this example 
gives 
1=5-2-2=5-—2-(12—2-5)=5-5-2-12 
= 5- (437 — 36-12) —-2-12 =5-437— 182-12 
= 5-437 — 182-12 = —182- 12 = 255-12 (mod 437), 


and we see that 255 is the inverse of 12 in Z437. This method works in 
general and is the basis for the following result. 


Theorem 8.1.8. Let m be any positive integer and a be any integer. Then 
a is invertible modulo m iff gcd(a,m) = 1.” 


2The Euler phi function, ¢(m), counts the number of invertible elements modulo m. 
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Proof. If a is invertible, there exists « such that az = 1(modm), which 
means that ax + my = 1 for some y. Therefore, the gcd must divide 1, 
and so equals 1. This proves the only if direction. On the other hand, when 
gcd(a,m) = 1 we can “solve the Euclidean Algorithm backwards” to obtain 
x,y such that 1 = az + my, and ax = 1(modm). O 


Combining these results gives a general answer to the first question at 
the beginning of this chapter. 


Theorem 8.1.9. If S is the function in (8.5), then every orbit of S is 
periodic iff gcd(cp,m) = 1. In particular, when the modulus m is prime, 
every solution to a linear recurrence modulo m is periodic. 


8.1.2 Fast modular computations 


In this section we describe Montgomery multiplication, a quick way to com- 
pute the product a * b (mod m). The method was first suggested by Peter 
Montgomery [115] in 1985. The technique is helpful for implementing mod- 
ular exponentiation used in many cryptosystems, for example, in RSA [137] 
and in any RSA-type key exchange in other cryptographic methods. 

The technique relies on the fact that there are moduli for which arith- 
metic is quick, for instance a computer’s machine word size. For any such 
modulus r and any m with gced(m,r) = 1, Montgomery multiplication 
translates operations mod m to the faster operations mod r. Articles [166, 
58, 167] can be consulted for a discussion of an efficient hardware imple- 
mentation of Montgomery multiplication. In particular, [58] claims that 
their implementation is twice as fast as the methods previously used for 
modular arithmetic. (Refer also to [14].) 

We may assume that our factors a and b are greater than 0 and less 
than m. We write our modulus m and each of our factors in their base-r 
representation, where the digits are indicated by subscripts, for example, 
a=aj+ayr+-:-+agr*, and each a; satisfies 0 < a; < r. Since we are 
assuming that 1 = gcd(m,r) = ged(—mo,r), then —mpo is an invertible 
element of Z, and —mgn = 1(modr) for some 1 < n < r. Computing n 
is a one-time calculation. 

Montgomery multiplication (as given in [58]) involves the calculation of a 
sequence Ro, Ri,..., Ry of integers for which P = r*+1R, equals ab mod m 
and 0 < P < 2m. Then either P or P—m is the required product ab mod m. 
(Notice that if r is a power of 2, then P can be calculated easily from R, 
by shifting. More general r’s are considered in [167].) 

The sequence (R;) is inductively calculated in tandem with another se- 
quence, which we call (Q;) . For the choice of Qo = ap bo n mod r, the nat- 
ural number agb + Qom is divisible by r and is congruent to a9b(mod m). 
We set Ro = (aob + Qom)/r. Continuing, for Q; = Ro + aybon mod r, it 
can be checked that Ro + a1b+ Qim is divisible by r and is congruent to 
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a,b(mod m). Setting Ry = (Ro + a1b + Qim)/r, we have 
r?Ry = rRo + airb+Qimr = agb+ayrb = (ao + ayr)b (mod m) . 


Here’s the algorithm: 


PROCEDURE MULT(a, b) 
R:=0 


Note that only the least-significant digit of b is needed. Refer to Exercise 8.5 
for a justification of the algorithm. Because of the initial investment of time 
for the computation of -m~! mod r and the final adjustment of multiplying 
by r*+!, the algorithm is not particularly helpful for calculating just one 
product. 


8.2 Finite Fields 


In this section we begin the study of finite fields, generalizations of the 
integers modulo a prime. They provide a more sophisticated context for 
investigating modular recurrences. Evariste Galois [66] was the first to use 
some properties of finite fields, and the first systematic theory was written 
by Leonard E. Dickson in [53]. Because they’re finite, recurrences in finite 
fields satisfy the periodicity results proved in Section 8.1. 

Let F be a finite set with two operations, + and *. Then F is a finite 
field under these operations if the following hold: 


1. Fis an abelian group under +, which means that + is an associative 
and commutative operation on F; + has a special identity element 
denoted by 0 such that a+0 = a holds for all a € F and each element 
a has an (additive) inverse b € F satisfying a + b = 0. 


2. The non-zero elements of F form an abelian group under *, which 
as above means that * is an associative and commutative operation 
and has a special identity element denoted by 1 such that ax1=a 
holds for all a € F and each non-zero element a has a (multiplicative) 
inverse b € F satisfying a* b= 1. 


3. The operations of + and * are connected by the distributive law, 
which means that a; *(ag+a3) = a1*@2+a1*az for all a1, a2, a3 € F. 
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For example, Z; = {0,1,2,3,4} is a field under the operations of addition 
and multiplication mod 5, where the additive and multiplicative identities 
are respectively 0 and 1; 0 is its own additive inverse; and 1,2,3,4 have 
additive inverses 4,3,2,1 and multiplicative inverses 1,3,2,4. The associa- 
tive, commutative, and distributive properties are inherited from the set 
of integers. Although Z, = {0,1,2,3} is an abelian group under the oper- 
ation of addition mod 4, the fact that gcd(2,4) = 2 implies that 2 is not 
invertible under multiplication mod 4, and Z, is therefore not a field. 


Theorem 8.2.1. Z,, is a finite field under the operations of addition mod 
m and multiplication mod m iff m is prime. 


Proof. As in the two examples above, we observe that the associative, com- 
mutative, and distributive properties are all inherited from the integers. 
Also, 0 is the additive identity and Z,, is an abelian group under addition 
mod m. By Theorem 8.1.8, a is invertible mod m iff gcd(a,m) = 1. Ensur- 
ing that gcd(a,m) = 1 for all 1 < a < m is equivalent to requiring that m 
be prime. O 


The number of elements in a finite field must be a prime power. (Refer to 
Exercise 8.6.) 

Let F4 be the set of polynomials {0,1,x,«-+ 1}. If we define addition on 
F, to be the usual polynomial addition followed by the reduction of the 
coefficients mod 2, it can be checked that F4 is an abelian group under 
addition. If we define multiplication as in Table 8.1, 


TABLE 8.1. The multiplication table for the field with four elements 


then 1 is the multiplicative identity, and each of the non-zero elements 
1,z,2+ 1 has a multiplicative inverse (respectively 1,2 + 1,2). The asso- 
ciative law for multiplication and the distributive law are tedious to check, 
but F4 satisfies these laws. 

It’s helpful to examine this example further. In earlier chapters the nota- 
tion R[x] meant the set of polynomials with coefficients from R. Likewise, 
F[z] will be the set of polynomials with coefficients from a field F. For in- 
stance, Z2[x] is the set of all polynomials whose coefficients are either 0 
or 1. We’re usually interested not only in a set of polynomials, but also in 
its algebraic properties. In F[a] we can define the polynomial operations of 
addition and multiplication similarly to the operations in R[a] with the co- 
efficients calculated using the operations in F. For example, you can check 
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that when f(z) = x? +2 and g(x) = 23+ 2+ 1 are considered as elements 
of Z2[x], their sum and product are 


f(z) + o(z) = 2? +2741 and fla)g(az) = 2° +27*+27 +2. 


This discussion allows us to describe the four-element field F4 above in 
another way. Let S be the set of all constant and linear polynomials in Z2[]. 
Since S' is an abelian group under addition in Z2[2], we have a suitable ad- 
dition for $. What about multiplication? The set S is not closed under the 
usual polynomial multiplication in Zz, because for instance (x+1)? = 27+1 
is not an element of S. But a slight modification of the usual multiplica- 
tion using the polynomial f(z) = x? + 2+ 1 (which is not in S) works. 
Namely, the product of two elements in S' will be defined by first obtain- 
ing the product p(a) of the two polynomials as elements of Z2|a] and then 
“reducing p(x) mod f(),” in other words, using the Division Algorithm 
in Z2[2] to find the unique remainder when p(x) is divided by f(a). Since 
we’re dividing by a quadratic polynomial, the Division Algorithm always 
yields a remainder in the set S. For example, 2(2+1)=2?+a= f(x)+1 
gives the product «*(a+1) =1 in S. (In practice, p(#) mod f(x) is often 
computed by repeated subtraction of multiples of f(x) rather than by using 
long division.) 

This procedure can be generalized to obtain a finite field with p™ ele- 
ments for any prime p and positive exponent m. The construction relies 
on the fact that for every p and m there is an irreducible polynomial 
in Z,[x] with deg(f) = m, where irreducible means that f(x) cannot be 
factored into polynomials of smaller degree in Z, [x]. The construction and 
more information about finite fields can be found in [98, p. 91 ff]. (Also, 
refer to Exercises 8.8-8.12 at the end of this chapter.) In particular, for 
each choice of prime p and exponent m there is essentially only one finite 
field with p™ elements. These fields are frequently denoted by GF(p™) and 
are the finite Galois fields, named in honor of E. Galois. In what follows 
we drop the * symbol for multiplication, and instead use juxtaposition of 
elements to denote multiplication. Our proof of Theorem 8.1.9 generalizes 
to finite fields. 


Theorem 8.2.2. In a finite field every linear recurrence is periodic. 


8.3. Periods of First-Order Modular Recurrences 


In what follows we will allow R to be either the set of integers mod m 
or a finite field and investigate the period of first-order linear recurrences 
$j41 = as; +b in R, where a ¥ 0. We first note that when a = 1, then 
Sn = 89 + nb and the period of S(x) = «+ b is therefore the least integer 
t > 1 such that tb = 0 in R. We next consider the case in which a — 1 is 
invertible. 
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Lemma 8.3.1. Let a be an element of R such that a—1 has a multiplicative 
inverse c in R. If (s;) is a solution to the recurrence s;41 = as; +0 in R, 
then 


(8.6) Snt+j = @ 8m + be(a? —1) for all n,j. 


Proof. For fixed n we prove (8.6) by induction on j > 1. Since c(a—1) = 1 
holds in R, 
Sn41 = 48, +b =as, + be(a— 1). 


Assuming that (8.6) holds for all 1 <j < J, 


Sng = O8n4g—-1 + b = a(a?—'s,, + bce(a?—! — 1)) + be(a— 1) 


= a7 sy, + bc(a? —a+a—1) =a" 8, + be(a? — 1), 


completing the proof. O 


Theorem 8.3.2. Let S be the function defined on R by S(x) = ax + b, 
where b is arbitrary anda—1 is any element of R that has a multiplicative 
inverse cin R. Then s9 = —bc is the only fixed point of S, and every 59 € R 
is an eventually periodic point of S whose period equals the least integer t 
such that (at —1)(S(*) (so) — S((s9)) = 0 holds for all sufficiently large 
n. 


Proof. We see that so is a fixed point iff asg+b = so, which can be uniquely 
solved for s9 = —bc. From Theorem 8.1.6, every orbit of S is eventually 
periodic, and if t is any multiple of the period of so, then 5,4; = s, for 
sufficiently large n. By (8.6) this becomes 

a's, + bc(a’ — 1) = Sn, 


(a’ —1)(8n + bc) = 0. 
Multiplying the last equation by a — 1 gives 


0 = (a* — 1)((a — 1)sn +b) = (a* — 1)(8n41 — 3n). 


Because a — 1 is invertible in R, each of these steps is reversible, and we 
obtain 
Snit = Sn <> (a° —1)(8n41 —3n) =0, 
which completes the proof. oO 
Let us consider orbits of S(a) = 122 +4 on R = Zz. Since 117! = 2, 
then —bc = 13 is the only fixed point of S. For instance, the orbits of 1 and 


2 are 
1,16,7,4,10,19,1,.... and 2,7,4,10,19,1,16,7,..., 


where 1 is a periodic point, while 2 is eventually periodic. The periods are 
the same, but is that only because 7 is a common term of both orbits? In 
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Exercise 8.15 you’re asked to find the periods of other orbits under this 
map. 

When R is a finite field, the situation is quite simple. Every orbit under 
S(a) = ax +b is periodic, and the period is the same for every initial value. 


Theorem 8.3.3. Consider the iteration of S(x) = ax + b in the finite 
field F. When a = 1, the period of every orbit is the characteristic of the 
field F, the least positive integer n such that the sum of n copies of any 
element in F is zero. (Refer to Exercise 8.4.) When a # 1, let c be the 
multiplicative inverse of a—1. Then x = —bc is the only fixed point of S, 
and for all other x 4 —bc the period equals the period of 1 under the linear 
function So(x) = ax. This number is called the order of a in F and will be 
denoted by ord(a). 


Proof. From our remark before Lemma 8.3.1, the period of S(x) = «+b 
is the characteristic of F. For any S(x) = ax + b with a # 1, the orbit of 
any so # —bc is periodic with period t > 2, which all s,,41; 4 8 for all 
m, and Sm41 — Sm is an invertible element of F. Therefore, the condition 
in the last theorem becomes a‘ = 1, and the period is t = ord(a). oO 


We can rewrite the fact that ord(a) is the period of the sequence (a’) as 
(8.7) a=a’ <= i=jJj(modord(a)), 


a useful observation in what follows. The next algebraic result is a com- 
putational aid for computing ord(a). Its statement appeared in a letter 
written by Pierre de Fermat to Frénicle de Bessy in 1640, and was later 
proved by Gottfried Leibniz. It’s usually called Fermat’s Little Theorem, to 
differentiate it from Fermat’s Last Theorem, which was proved in 1994 by 
Andrew Wiles with the assistance of Richard Taylor [168, 158]. Alf van der 
Poorten [162] has published a fairly accessible account of both the history 
and the proof of Fermat’s Last Theorem, the culmination of the work of 
many mathematicians. 


Theorem 8.3.4 (Fermat’s Little Theorem). Jf F is a finite field with 
q elements, then every non-zero a € F satisfies at~+ = 1, and so ord(a) 
divides q—1. 


Proof. Let a be a fixed non-zero element of F, and let a1,...,a@g—1 be any 
listing of all non-zero elements in F. If a~! is the multiplicative inverse of 
a, then for any i, 7, 

aa; = aa; <=> aaa; = a ‘aa; —S a =4;, 
which means that the sets {aa1,...,@a@ 1} and {a1,...,@g—1} are equal. 
Because multiplication is commutative, the product of the elements in the 
second set equals the product of the elements in the first set, 


1+ **@q—1 = (aay) +++ (aag_1) = at" (ay +++ @q-1). 
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The product a, ---ag—1 is a non-zero element of the field F, and multiplying 
the above equation by this element’s inverse yields a¥~+ = 1. O 


Corollary 8.3.5. The period of any first-order linear recurrence with a # 
1 in a finite field with q elements divides q — 1. 


We now use this result to determine the periods of S(x) = 215a + 3 in 
Zm, for the prime m = 12323 without actually calculating any orbit. Since 
m is a prime, F = Z,,, is a finite field, and all orbits are periodic. From 
Exercise 8.3, 5816 is the multiplicative inverse of a— 1 = 214 in F and 
$9 = —3 * 5816 = 7198 is the only fixed point of S. The period of any 
80 # 7198 equals the order of 215 in F and it remains to find t = ord(215). 
Using Fermat’s Little Theorem, t divides m—1 = 12322 = (2)(61)(101), and 
there are eight possible values for t. (What are these eight values?) For each 
divisor k of 12322, calculating a’ can be done by fast exponentiation, 
the same process that we have already used several times to find powers of 
matrices. For instance, 101 = 2° + 25+ 2? + 1 gives 


all = @? aq? a® a = A797 * 1822 * 3809 * 215 = 1(modm). 


Since 101 is prime, ord(215) = 101, and S has one fixed point and 122 orbits 
whose period is 101. 

We close this section with one more algebraic result. Every odd element 
in Zg (there are four of them) is a root of the quadratic equation x? —1 = 0. 
This overabundance of roots (there are more than deg(x? — 1) of them) can 
never happen when the coefficient set is a field. (Refer to Exercise 8.23.) 
This simple fact can have some surprising consequences. For instance, if F 
is a field with g = p™ elements, then the order of s € F divides p—1 iff s is 
a root of 2?! — 1 = 0; there are at most p — 1 elements of F whose order 
divides p—1. Fermat’s Little Theorem gives the following result, which will 
be useful later. 


Theorem 8.3.6. Let F be a finite field with q = p™ elements. If s is a 
non-zero element of F, then ord(s) divides p—1 iff s € Zp. 


8.3.1 First-order modular recurrences with maximal period 


In this section we prove that any finite field with q elements has an element 
whose order is g — 1, which we know is the largest possible order. These 
elements are called primitive elements of F. The orbits of S(x) = ax+b are 
easily described when a is a primitive element: there are only two orbits, the 
fixed point and everything else! The reason for this is that Theorem 8.3.3 
can be applied, since a cannot be either 0 or 1 and so both a and a — 1 
are invertible elements of F. The next lemma is used in our proof of the 
existence of primitive elements. 


Lemma 8.3.7. Let a,b be non-zero elements of F with ord(a) = r and 
ord(b) = s. If gcd(r, s) = 1, then ord(ab) = rs. 
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Proof. Let n = ord(ab). Then (ab)” = 1. Also, 
1 _— ((ab)”)" = (ab)"” — (a’)"b"” -_ br”, 


and rn must be a multiple of ord(b) = s. Since gced(r,s) = 1, this implies 
that s divides n. Interchanging the roles of a and b we obtain that n is a 
common multiple of r and s. Again using the fact that gcd(r,s) = 1, we 
have lcm(r, s) = rs, and so rs divides n. We complete the proof by showing 
that n divides rs. (Since rs and n are both positive, this gives n = rs.) For 
this, 
(ab)"* = (a")°(b°)" = 1, 

implying that rs is a multiple of ord(ab) = n. Oo 


Theorem 8.3.8 (Primitive Element Theorem). Every finite field with 
q elements has at least one element whose order equals q —1. 


Proof. Let N be the maximum element in the finite set {ord(b) : b € F}, 
and let d be some element of F with ord(d) = N. Fermat’s Little Theorem 
implies that N < q— 1. If we can prove that the order of every element 
divides N, then each of the gq — 1 non-zero elements of F would satisfy 
a’ —1=0, and by Theorem 8.3.6, N > q —1 also would hold. 

By way of contradiction, we assume that there exists an element c € F 
whose order does not divide N, from which it follows that there exists 
a prime p such that p? divides ord(c) and p? does not divide N. Let 
ord(c) = p’N,, where gcd(Ni,p) = 1, and let @ < y such N = p® Np 
with gcd(No,p) = 1. For a = ct and b = d?” we have ord(a) = p? and 
ord(b) = Ng. Since these orders are relatively prime, the last lemma implies 
that ord(ab) = p’N2 > p? Nz = N, contrary to the maximality of N, and so 
the order of every element of F does in fact divide N, giving N=q—-1. O 


In Sections 69-75 of his famous Disquisitiones Arithmeticae [70], Carl 
Friedrich Gauss considered the question of finding primitive elements for a 
prime modulus. Gauss wrote (in Latin) “Euler admits that it is extremely 
difficult to pick out these numbers (primitive elements) and that their na- 
ture is one of the deepest mysteries of numbers.” Finding a primitive el- 
ement is still considered a hard problem. It is related to the discrete log 
problem, the basis for the (assumed) security of several cryptosystems. 

In 1927 Emil Artin first conjectured that every nonsquare positive inte- 
ger is a primitive element for infinitely many prime moduli. Some specific 
examples of this conjecture had already been constructed in the years be- 
tween Gauss’ work and 1927. For instance, 2 was already known to be a 
primitive element for all primes p of the form p = 2q¢+1, where q is prime 
(these primes are now often called “safe primes” .) In [80] Christopher Hoo- 
ley proved that Artin’s conjecture would follow from the Generalized Rie- 
mann Hypothesis, a result that is thought to be true but is considered to 
be very difficult to prove. 
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Although it is hard to find primitive elements, our proof of the Primi- 
tive Element Theorem can be combined with trial and error to construct 
primitive elements. For example, to find a primitive element in the finite 
field F = Z3,, we could first try a = 2. Since the powers of 2 modulo 31 
are 2,4,8,16,1, then ord(2) = 5 < 30, and 2 is not a primitive element. 
We notice that 5 does not occur in this sequence and compute the pow- 
ers of 5 mod 31. These are: 5,25,1, which gives ord(5) = 3 in F. Since 
gcd(ord(2), ord(5)) = 1, from the lemma, ord(2*5) = 15. Also, ord(—1) = 2 
implies ord(21) = ord(—10) = 30, and 21 is a primitive element in F. 

We can use the existence of primitive elements to derive the following 
somewhat surprising result. 


Corollary 8.3.9. [fF is a finite field with q elements, then for any divisor 
d of q—1 there exists a first-order recurrence in F whose period is d. 


Proof. Let a be any primitive element in F. For any divisor d of g—1, there 
exists k such that kd = q—1. The period of the recurrence 5,4, = a®s,+b 
(for any b) is ord(a*) = d. O 


8.4 Periodic Second—Order Modular Recurrences 


In the last section we proved that when the modulus is prime, the period of 
all non-fixed points under a first-order modular recurrence sy+41 = 48, +0 
is the order of the coefficient a. For higher order sequences the situation is 
less clear and seems to be more complicated. 

Our analysis for second-order recurrences involves the matrix form of 
the recurrence. Let R be either a finite field or Z,, for some m > 2 and let 
(s;) be a solution to a second-order homogeneous recurrence in R, 


Sn42 = C1$8n41 + C28n, C2 #0. 


Although linear algebra cannot be used here unless R is a field, matrices 
are still a convenient way to represent higher order recurrences. Recall that 
the companion matrix of the recurrence is 


(8.8) A= ki A 


and consecutive pairs S; = (s;11,8;)/ are connected by the matrix equation 
S, = AS,_-1, implying 
S, = A"So. 


When cy is invertible in R, it can be checked that 


0 i. 
= 
A = leo | 5) 
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and A is an element of the finite set S of invertible 2 x 2 matrices with 
entries in R. In Exercise 8.14 you prove that this implies there exists a 
minimal positive integer / (called the order of the matrix) such that 
A*® = I, from which we get So = S; = A’ So, proving that k is a multiple of 
the period of the sequence. Although the order of the matrix can be larger 
than the period (see Exercise 8.25), we will next obtain conditions on the 
initial values that ensure that the period equals the order of the matrix. 

Let T be the map T(S) = AS defined on R?. Then T is a linear map since 
it satisfies T(cS + X) = cT(S) + T(X) even when R is not a field. If the 
first two state vectors, Sp = (81,89)? and S; = (s2, 81)", form a spanning 
set for R?, then each (linear map) JT” is determined by its effect on the 
set {59,51}. In particular, A” = J iff both T”(.S9) = So and T"(S;) = 54, 
which implies that the order of the matrix equals the period of the solution. 
We’ve proved the following theorem. 


Theorem 8.4.1. Let R be either a finite field or Z», for some m > 2. If 
Sn42 = C1$n41+C28n 1s a second-order recurrence in R and cz is invertible, 
then every solution is periodic and the period always divides the order of 
the companion matrix of the recurrence. Moreover, if {50,51} is a spanning 
set for R?, then the period equals the order of the companion matric. 


Since our proof relied only on the invertibility of the companion matrix, 
an analogue of this result holds for kt order recurrences. (Refer to Exer- 
cise 8.27.) 

When RF is a field and the eigenvalues of A are distinct, it can be checked 
that any non-trivial So, S) are linearly independent, and we have the fol- 
lowing theorem. 


Theorem 8.4.2. Let F be a finite field and let sn42 = C18n41 + CoSn be a 
second-order recurrence in F with two distinct eigenvalues 1 # A in F. 
Then every non-zero solution is periodic and its period equals the order of 
the companion matrix unless it has the form 8s, = yAv, in which case the 
period is ord(A;). 


8.4.1 Periods of modular Fibonacci sequences 


We now specialize to the Fibonacci recurrence, and first answer the ques- 
tions posed at the beginning of this chapter. As already noted, every Fi- 
bonacci orbit is periodic, and the period mod 6 is lem(3,8) = 24. If (fj) 
is the Fibonacci sequence, we noted in Chapter 2 that the orbit of any 
(a,b) € Z? under F(x, y) = (y,x + y) is the sequence (bfj;+1 + af;) (where 
f-1 = 1). Therefore, the period of any orbit under F' modulo 6 must divide 
the period of the orbit of (0,1), and so 24 is the largest period obtained 
from the Fibonacci recurrence mod 6. 

Our final question was how many different orbits are generated by the 
Fibonacci recurrence mod 6, where two orbits are considered the same 
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when they’re translates of each other. First note that (0,0) is the only 
fixed point. Among the 6? — 1 = 35 non-zero elements of Z2, the usual 
Fibonacci sequence (which is the orbit of (0,1)) generates 24 different pairs 
of consecutive elements, and this leaves 35 — 24 = 11 non-zero pairs to be 
accounted for. In the orbit of (2,2), every element of the orbit is even, 
and so its period equals the period of (1,1) mod 3, which we’ve already 
calculated to be 8. (Because its period is not 24, this orbit is completely 
disjoint from the orbit of (0,1).) Similarly, since the period of (1,1) mod 2 
is 3, the orbit of (3,3) mod 6 is 3. These two periods account for the other 
11 elements and give a total of three different non-zero orbits under the 
Fibonacci recurrence. (Also refer to [163].) 

What about the period of the Fibonacci sequence for other moduli? A 
paper by D.D. Wall [165] was the first systematic approach to this problem, 
and although many of his results have been generalized, his 1960 paper 
contains most of what is currently known about modular Fibonacci periods. 

Recalling that the period mod m = p{! --- p%" is the least common multi- 
ple of the periods mod p*", it is enough to consider moduli that are powers 
of a prime number. Also, if t = t(p’t+) is the period of the Fibonacci 
sequence mod p/+! for any j > 1, then 


St =0 (mod p’*") and Siqi =1 (mod pitt) 


are congruences that also hold modulo all divisors of p’**. Therefore, for 
all i < 7 it is also true that (ft, f:41) = (0,1) (mod p'), and we have 


t(p’*") is divisible by t(p’). 


Since the first two state vectors, So = (1,0)", 51 = (1,1), form a span- 
ning set for every Z?,, the period is always the order of the companion 
1 1 
1 0 
(refer to Exercise 8.26) that for all n > 0, t(2”) = 3-2”"~!. The periods 
t(5") = 4-5" can be found in a similar manner. 

What about powers of other primes p # 2,5? It has been conjectured 
that t(p?) = p’—'t(p) holds for all prime powers, and this was verified by 
Wall for all p < 10,000. His paper would be very long indeed if for each 
prime less than 10,000 he performed an induction similar to what we have 
just outlined for p = 2. Rather, he proved the surprising fact that if p 
is a prime for which ¢(p?) 4 t(p), then it is true that t(p/tt) = p’t(p) 
for all 7 > 1. We don’t give his argument here but simply comment that 
he used several combinatorial identities to derive the result. In January 
2003 Jonathan Goff, a graduate student in mathematics at Oregon State 
University, extended Wall’s calculations by verifying that t(p?) 4 t(p) for 
all primes less than one million and for all primes p = +1 (mod 10) that 
are less than twenty million. 

Let’s look again at the Fibonacci sequence mod 5. Rather than computing 
all terms until the second occurrence of 0,1, we’ll consider what happens 


matrix, F = |: Using the fact that F? = I + 2F, it can be shown 
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algebraically. In Zs[z] the characteristic polynomial ch(x) = 2? —xa—-1 
factors as ch(x) = (x — 3)”. Since \ = 3 is a repeated eigenvalue and Z; is 
a field, * there exists a polynomial p with deg(p) < 1 such that 


(Note that this is written as an equation in Zs, and is actually the same 
as saying that f, = p(n)3”(mod5).) From fp = 0 and f; = 1 we obtain 
p(n) = 3~1n, which gives 


fn =73"—} for alln > 0. 
If t = t(5) is the period, in Zs we have 
i3’-! = f, = fo =0 and (€1)23 = fig = fi = 1,5 


which can be solved to find that t is the least positive integer that simulta- 
neously satisfies t = 0 (mod 5) and 3' = 1 (mod 5). Since ord(3) = 4 in Zs, 
the period is t = lem(4,5) = 20. 

For this argument to work for other primes p # 2,5, we’d like to find 
the eigenvalues of the recurrence modulo p, and in order to do so, we must 
factor the characteristic polynomial. But ch(a) might be an irreducible 
quadratic in Z,[z]. When is ch(x) irreducible? Since a quadratic polynomial 
is irreducible in Z,[z] iff it has no roots in Z,, we'll use the technique of 
completing the square to determine whether ch(x) has roots. The oddness 
of p means that 2 is invertible, say 2c = 1 in Z,, and 


ch(a) = 2? —g-—1= 27 — 2c — 4c? = (2 — c)? — 5c’, 


giving ch(a+c) = x7—5c?. Since ch(z) is irreducible exactly when ch(x+c) 
is irreducible, we obtain 


ch(z) is reducible in Z,[z] <> p#2 and x? — 5c’ has roots in Zp. 


The last restriction is the same as requiring that 5c”, and so 5, is a square 
in Z, (which is often called a quadratic residue.) For what odd primes 
does this happen? For this type of question, number theorists invoke the 
famous Law of Quadratic Reciprocity, which was first stated by Leg- 
endre and proved by Gauss [70, Section 123] 4. Using quadratic reciprocity, 
it’s possible to prove that p = +1 (mod 10) are exactly the odd primes for 
which 5 is a non-zero square in Z,[x]. These are therefore the primes p 
for which the Fibonacci recurrence mod p has eigenvalues in Z,[x]. What 
about the other odd primes, p = +3(mod10)? For those primes ch() is 


3Note that this argument cannot be used for m = 25, since Za5 is not a field. 
4Consult the website http://www.rzuser.uni-heidelberg.de/~hb3/rchrono.html 
for a chronology of the many proofs of the Law of Quadratic Reciprocity. 
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irreducible in Z,[z]. But (refer to Exercise 8.12) K = Z,[z]/(ch(a)) is a 
field with p? elements, and ch(x) has a root in K. 

Setting K = Z,[z]/(ch(x)) for primes p = +3 (mod 10), and setting K = 
Z, for primes p = +1(mod10), then for each prime p # 2,5 we have 
defined a field K in which ch(x) has roots in K. We next show that these 
eigenvalues are different. If \y = A2 were to hold, then (x—A,)? = «7-2-1, 
and A? = —1. Since ch(A1) = 0, then \? = A, +1, from which we get 
A, = —2. Therefore, 5 = ch(—2) = 0 in K, contradicting p 4 5. So, for 
p # 2,5, the modular Fibonacci matrix F' has distinct eigenvalues and 


so is diagonalizable. The change of basis matrix is the invertible matrix 
Ai 2 


| with 


_1_ [rr 0 ie [A -O 
BFB alk | and BRIB N= |" yl 


This means that lcm(ord()1), ord(A2)) is the order of F’, which equals t(p) 
(recall Theorem 8.4.2). 

You might rightfully question the helpfulness of relating the period to the 
order of certain elements because we have already said that calculating the 
order of an element is a hard problem. (In particular, refer to our discussion 
of Artin’s conjecture on primitive elements.) The answer to this objection 
is that this connection allows us to prove other results about the period, 
and the next theorem is an example of this. (A different proof of this result 
can be found in [165, Theorems 6, 7].) 


Theorem 8.4.3. For prime p 4 2,5, let t be the period of the Fibonacci 
sequence modulo p. 
(a) Ifp=+1(mod 10), then t divides p—1. 
(b) If p = +3(mod 10), then t divides 2(p +1), and also (2p + 2)/t is 
odd. 


Proof. From ch(x) = 2? — x — 1 = (a — A1)(4 — Az) we have A; = —Ajz?. 
Since ord(A) = ord(A~') and ord(—1) = 2, then 


Iem(ord(A1), ord(A2)) = lem(ord(A1), 2) , 


and the above discussion gives t = lem(ord(Aj), 2). 

First considering the case in which p = +1 (mod 10), recall that K = Z,, 
and by Fermat’s Little Theorem we know that the order of the non-zero 
element A; € K divides p— 1, an even number, and so lem(ord()1), 2) does 
divide p— 1. 

In the other casein which p = +3 (mod 10), the field K contains an el- 
ement a such that a? = 5. Since 5 is an element of Zp, then a2?-) = 
5P-! — 1. This means that xz = a?! is a root of the polynomial x? —1 = 0, 
which has only the two roots « = +1 in any field. Since a ¢ Zp, we 
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know that a?~! #4 1 (refer to Theorem 8.3.6), and so a?~! = —1 follows. 
What does this have to do with ord(\;)? In the “completing the square” 
argument above we had ch(x) = (a — c)? — 5c?, where 2c = 1. Because 
ch(A1) = 0, (Ay — c)? = 5c? = a?c’, implying 4, = c(1 +a), and we can 
assume the positive sign occurs. Since K has p? elements, then (refer again 
to Exercise 8.7) 


(l+o)? =1+a0?=1+a0?1=1-a, 
which when combined with c?~! = 1 gives 
APT? = fH -— a)(1+ a) = (1 — a”) = (1 — 5) = —(2c)? = -1. 


Then Ney = 1, and so ord(A,) divides 2(p + 1). Further, writing 2(p + 
1) = 2*n where n is odd, we know that ord(\1) divides 2*n and doesn’t 
divide 2*~'n (since \,?t' 4 1). This means that ord(\;) is even, and 
t = Iem(ord(A;), 2) = ord(A1) with (2p + 2)/t odd. 


We calculated t(p) for all primes p < 1000 and found that the upper 
bound in the theorem is attained quite often for these primes. Of the 
seventy-eight primes that are congruent to +1 (mod 10), the period equals 
the upper bound for forty of them. The situation for the “irreducible” 
primes is even stronger. Among all primes less than 1000 there are eighty- 
eight primes congruent to +3 (mod 10) and for all but fifteen of these the 
period equals the upper bound. Table 8.2 lists the irreducible primes with 
smaller periods, and the second column gives the quotient (2p + 2)/t. 


TABLE 8.2. The exceptional primes congruent to +3 (mod 10) 


(2p + 2)/t 


47, 107, 113, 233, 353, 563, 677, 743, 977 
307, 797 

263,557,953 

967 


We close this discussion by computing the Fibonacci period t(2° 3? 17 477). 
Corollary 8.1.4 implies that the period is lem(t(8), t(9), t(17), t(477)), where 
by Wall’s calculations we know that 


(8) = 274(2),  ¢(9)=34(3), and +t(47?) = 47t(47) 


From the earlier calculations given in Section 8.1, we have t(2) = 3 and 
t(3) = 8. The remaining two primes p = 17,47 are congruent to —3 (mod 
10), and from Table 8.2 we see that 17 is not exceptional, which means 
that t(17) = 2(18) = 36 and that t¢(47) = 2(48)/3 = 32. Combining these 
facts gives the period 


lem (12, 24, 36, (32)(47)) = (32)(9)(47) = 13536. 
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8.5 Applications 


8.5.1 Application 1: Pseudorandom number generation 


Random numbers are often needed in scientific computation, especially 
simulations and probabilistic algorithms that have some stochastic com- 
ponents. They are also used in cryptology and even computer games. The 
term “random sequence” means a sequence that passes certain statisti- 
cal tests. For instance, one simple randomness requirement for sequences 
from Ziq might be the property that each digit should occur about the 
same number of times, so that in suitably long subsequences each decimal 
digit occurs close to one-tenth of the time. In contrast to sequences gen- 
erated mechanically or by a physical phenomenon, sequences generated by 
a mathematical iterative process (such as a recurrence) can be proved to 
have good statistical properties. Another advantage of such deterministic 
procedures is that they will dependably generate the same sequence when 
the initial conditions are unchanged. This allows for the exact reproduction 
of data for numerical experiments. Sequences that are generated determin- 
istically and have good statistical properties are called sequences of pseu- 
dorandom numbers (PRNs). Because a PRN generator cannot satisfy 
all possible statistical properties (which is another reason for calling them 
pseudorandom), the practitioner should know which statistical properties 
are required for the application before choosing the generator. Chapter 3 
of Knuth [88] and Chapter 7 in Niederreiter [119] can be consulted for 
information on the common statistical tests used to test for pseudoran- 
domness. In this section we concentrate on two structural properties, the 
period length and the lattice structure. 

A linear modular recurrence is the simplest example of a determinis- 
tic process that can be used for generating PRNs. These are frequently 
called linear congruential generators. From the perspective of PRN 
generation, the Primitive Element Theorem guarantees that every prime p 
has at least one linear PRN generator mod p whose period is p — 1, the 
longest possible period. Knuth [88, Chapter 3] has an extensive discussion 
of the periods for general moduli. For many years after their introduction 
by D.H. Lehmer [94] in 1949, the linear congruential generator was the pre- 
ferred method of PRN generation. However, an article by G. Marsaglia [106] 
raised some serious questions about their use by proving that every linear 
PRN generator has an inherent lattice structure. As he said in the paper, 
“for the past 20 years such regularity might have produced bad, but un- 
recognized, results in Monte Carlo studies... .” In a later article [107] he 
developed the Lattice Test for PRNs, which can be stated as follows. 
For any deterministic sequence (s,,) (with period N > 2) generated in a 
finite field F and each d > 1, define the d-dimensional points U;,...,Un_—1 
by 


d 
U; = (i — $0, Sit1 — $1,---, 8i-14a — Sa-1) € F*, 
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and let Lg be the subspace of F? spanned by these points. For instance, 
L, = Span{U;} = Span{s; — so} =F 
(since s1; # 59) and 
Lz = Span{U;, U2} = Span{(s1 — 80, 2 — $1), (2 — 0,83 — $1)}, 


which may have dimension one or two over F. The lattice dimension 
for the sequence (s,,) is defined to be the largest integer D > 1 for which 
Lp = F”. When the lattice dimension is small, the points display a high 
degree of regularity, and so the generation is predictable and not random. 
A linear generator fares poorly under this test, since an inductive argument 
shows that s, = as,_1 + } satisfies 


Sn4i — 8 = a"(Sn — 80) forall i, 
and the points U; therefore have the form U; = (s;—so)(1,a,a?,...,a¢~1). 
This means that its lattice dimension is always 1. 

The remainder of this discussion on PRNs is devoted to an analysis of 
a commonly used nonlinear generator whose lattice dimension mod p has 
been proved to be at least (p+ 1)/2, much better than linear generators. 
The generator is called an inversive generator on the finite field F and 
has the form f(x) = axz~' + b (where we set 0~' = 0, but otherwise the 
arithmetic is performed in F.) Because f is a one-to-one function on F, any 
sequence generated by the first-order recurrence tp+41 = f(an) is periodic, 
and the number of elements in F is an upper bound on the period. Below 
we show that any finite field has inversive generators with this maximal 
period. 

Although the Euclidean Algorithm can be used to compute inverses 
quickly, finding inverses does take longer than performing “polynomial” 
operations, and so in this respect inversive generators are slower. On the 
other hand, research has shown that inversive generators do usually have 
better statistical properties than polynomial generators. (Refer to [{119, 
Chapter 8].) For instance, extensive computations by Poul Petersen [127] 
show that for all primes p < 10° the lattice dimension of inversive genera- 
tors mod p with maximal period is either p — 6, p— 4, or p— 2. (Table 8.3 
gives more details. Also, in [64] it’s shown that the lattice dimension of a 
maximal period inversive generator is always odd.) 

As said above, we will show that every finite field has inversive generators 
of maximal period. The polynomial ch;(x) = x? — br — a associated with 
the generator f(a) = az~! + 6 is useful in this analysis. For example, the 
generator f(z) = 3271+ 2 in F = Zy, has ch;(x) = x? — 2x — 3, which 
factors as ch;(x) = (# — 3)(a — 10) in F[a]. Its roots x = 3, 10 are the only 
fixed points of f, and every other element of F is in the orbit of « = 2, 
which is 2,9,6,8,1,5,7,4,0. This example illustrates a general property of 
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TABLE 8.3. Lattice dimensions for maximal period inversive generators mod p 
for 5 < p< 10° 


Number of primes less than 10° 9590 
Number of maximal period inversive generators | 5,579,945, 320, 208 


Number whose dimension is p — 6 
Number whose dimension is p — 4 1829 


Number whose dimension is p — 2 5,579, 945, 318, 378 


an inversive generator: its fixed points are precisely the roots of ch;(x) in 
the field F. To see this, we first note the fact that a is non-zero implies that 
0 is not a fixed point of f and is not a root of ch;(x). Therefore, any root 
c of ch; (x) is invertible, and multiplying by c~! yields 


chy(c) = c?—be—a = 0 e=beta <> c=btac!= fic). 


Because our goal is to find inversive generators with maximal period, we 
will consider generators that have no fixed points, which is equivalent to 
requiring that ch;(x) be irreducible in F[z]. As we used earlier with the 
Fibonacci recurrence, K = F[z]/(chr(x)) is then a field in which ch;(z) 
factors into linear factors, ch;(x) = (a — a)(# — 3) in K[a]. We ensure that 
the roots are distinct by requiring that the discriminant b? + 4a be non- 
zero. Although the recurrence is not linear, there is a relationship between 
the orbit of 0 and the roots of chr(x); namely, we will show that for every 
j 20 with fO-Y(0) £0, 

aft] _ git 
(8.9) fOy= re 
From ch;(x) = (x—a)(x— 8) = x? —ba—a, we have a = —a3 andb= a+, 
and this gives 
a? La B? 
a-—B’ 
which is (8.9) for 7 = 1. Further, if for 7 > 1 we have 


{O)=0=er. f= 


oft _ git 


f2(0) = aim Bi 


#0, 


then gi 42 _ git 
aii ai — Bi _ ait? — git 
FO (0) = 08 Ga apr ++ 8 = amg 

and the (j + 1)** element of the orbit of 0 is as given by (8.9). 


Theorem 8.5.1. Let F be a finite field with q elements and let f be an 
inversive generator on F such that ch;(x) is irreducible in F[a] and has 
non-zero discriminant. If t is the period of x = 0 under f, then the order 


of 89-1! in K= Fla]/(chr(x)) is t+ 1. 
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Proof. Since f is a one-to-one function, every orbit of f is periodic, and 
the period t is the least positive integer such that f“(0) = 0. As above, 
the irreducibility of ch;(x) implies that K = F[2]/(chr(x)) is a field that 
contains elements a, 3 such that (8.9) holds. Therefore, 


f(0) =-9 = gt. pitt eo (ap) =e 


Since t is the minimal positive integer with this property, t + 1 must be 
the order of a@~! in K. We complete the proof by showing that a = 6%. 
From Exercise 8.7 we know that ch ;(6%) = (chr(@))2, which means that 
84 is also a root of chy(x). The fact that 6 ¢ F implies 6% ¥ G (refer to 
Exercise 8.16), and 3% is forced to equal a, the other root of ch;(z). OO 


There is a similarity between this last result and what happens with first— 
order linear recurrences, since the period of 0 is related to the order of an 
element in an associated algebraic structure. Because calculating the order 
of an element can be quite lengthy, this result is more useful for theoretical 
rather than computational purposes. For instance, it’s used in [64] to show 
that the lattice dimension of an inversive generator is always odd. 

For the sake of concreteness, let us consider the orbit of 0 in F = Z,, 
under f(x) = 7z~!+ 10. Then ch;(x) = x? + x + 4, which can be checked 
to be irreducible in F[z] with non-zero discriminant. We won’t explicitly 
determine the field K = F[az]/(ch;(a)), but rather recall that it’s a finite 
field with (11)? = 121 elements in which ch;(x) factors into two linear 
factors. We want to calculate or otherwise determine the order of y = 
BI-! = 6'°, where @ € K is a root of x? + «4+ 4 and its order divides 
q? — 1 = 120. This means that the order of y = 3!° divides 12. Calculating 
7 = 6'° using fast exponentiation, from 6? = —3 — 4 we have 


B= (644) =7841; BP =(78+1P =-264+3, 


and 
7 = 8° - 6? = (-26 + 3)(-6- 4) =36 +2. 


Since 


oy? = (36+2)? =368+1 and 7? = (364+1)(38+2)=-1, 


the order of y = 37! is 6, which means that the period of 0 is five. This can 
also be verified by direct calculation, tracing the orbit of 0 as 0, 10,3, 5,7. 


Theorem 8.5.2. Every finite field has inversive generators with one orbit. 
This orbit contains every element of the field and so has the maximal period 
among all sequences generated by an inversive generator. 


Proof. Let F be a finite field with g elements, and let g be any irreducible 
quadratic polynomial in F{a]. (In Exercise 8.34 you show that such poly- 
nomials exist for all g.) From the Primitive Element Theorem, there exists 
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7 € F[z]/(g) whose order is g?—1. Therefore, y ¢ F, and the quadratic poly- 
nomial G(x) = («—7y)(a—7%) is irreducible in F[z]. For G(x) = x?-Ba—A, 
we consider the associated inversive generator f(a) = Ax~!+B, which has 
chy(x) = G(z). Since ord(y) = q? — 1, then ord(y?-') = q +1, and the 
period of « = 0 under f does equal q. O 


8.5.2 Application 2: Integer factorization 


The security of the RSA cryptosystem [137] relies on the difficulty of fac- 
toring integers that are a product of two large primes. Because of this, 
there was an increase of interest in factorization algorithms after the RSA 
cryptosystem was introduced in the late 1970s. 

Factoring a natural number m can be viewed as a search problem, search- 
ing the set S = {2,...,,/m} for divisors of m, but for large m searching 
by trial division is not practical. Instead of searching all of S, most mod- 
ern factoring techniques generate a subset J whose elements are likely 
to have a factor in common with m, and search for t € JT such that 
d = gcd(t,m) # 1,m. Once such t € T has been found, the factoriza- 
tion m = d- + is obtained. (A complete factorization of m is then found 
by continuing to factor each of d, 4.) The key insight was that searching 
for elements that have a factor in common with m can be used to get a 
divisor of m, and such a search is more reasonable than searching for exact 
divisors. For instance, in the extreme (RSA) case in which m is a product 
of two primes p,q, each of size O(./m), the integers 


D2 (G=— lip and 9g, 2¢.0.5(9—Tg 


are all divisible by p or qg, and we are more likely to locate one of these 
p+q—2= O(./m) integers than one of the two integers p, q. 


A Certificate of Compositeness. Most modern factorization methods 
are probabilistic, in the sense that the method is likely to return a factor- 
ization for composite m, but no specific run is guaranteed to produce a 
factorization of m. Because of this, these probabilistic methods are incon- 
clusive when applied to a prime, and the methods are used only after m 
has obtained a “certificate of compositeness.” Since the most obvious way 
of showing that a number is composite is to demonstrate a factorization, 
at first this may seem like a strong requirement. But there are some rela- 
tively quick tests for compositeness. We mention a few that are based on 
material already developed in this chapter, and refer the interested reader 
to [30, Chapter 3] and cr. yp.to/primetests.html for more information. 


>The authors of a recent text [30, p. 111] have calculated that “in one day of cur- 
rent workstation time, perhaps (the primality of) a 19-digit number could be resolved” 
using trial division. This translates to the factorization of a (very small) 38-digit RSA- 
composite requiring a full day of computation by trial division. 


8.5 Applications 243 


In the summer of 2002, Agrawal, Kayal, and Saxena found a determinis- 
tic primality test that has polynomial time, specifically O(log'?**(m)). © 
The number theory community was amazed and delighted by the simple 
elegance of the test and its proof. As of this writing, the test is not yet of 
practical use. 

Let m be the number to be tested for compositeness. For a fixed natural 
number a, 1 < a < m, we use the Euclidean Algorithm to calculate d = 
gcd(a,m). If d 4 1, we know for certain that m is composite, and have 
obtained its certificate of compositeness as well as the divisor d. We may 
therefore assume that gcd(a,m) = 1. The contrapositive of Fermat’s Little 
Theorem yields a Fermat Test for compositeness; namely, if there exists 
a, 1 <a<™m, such that a™~! 4 1(modm), then m is not prime. Notice 
that since, for example, the composite number m = 21 satisfies a’~! = 
1(mod m) for a = 8, the condition is only sufficient and not necessary for 
compositeness. 

A composite integer m that satisfies a”~' = 1(modm) for 1 <a<m 
is called a pseudoprime to the base a, where the term “pseudoprime” is 
used because for the base a the composite number m behaves as if it were 
prime. In 1950 Paul Erdés [59] proved that pseudoprimes are relatively rare 
when compared to primes, which means that performing a battery of Fer- 
mat Tests might be a useful strategy for verifying compositeness. However, 
in 1994 W.R. Alford, Andrew Granville, and Carl Pomerance [3] proved the 
existence of infinitely many composite m, that are pseudoprimes to every 
base which is relatively prime to m. These numbers are called Carmichael 
numbers, named in honor of R.D. Carmichael’s 1912 work [24]. 

All is not lost, because we can invoke our theory and obtain a variant, 
called the Strong Fermat Test, that doesn’t have this problem. The 
idea behind this is the following. If m were prime, then by Fermat’s Little 
Theorem, b = a” = would be a solution to 22 —1 = 0, a quadratic equation 
that can have only the two solutions x = +1 in the field Z,,. Therefore, 
if we can find a non-zero base a such that a”? is something other than 
+1 (mod m), we are assured that m is not prime. Because the exponent is 
slightly smaller, this test is a bit easier to perform than a Fermat Test, but a 
more important property is that there are no “strong Carmichael” numbers. 
In fact, in 1980 Monier [114] and Rabin [132] independently proved that 
every odd composite m > 9 is a strong pseudoprime for at most one- 
quarter of the bases mod m. (If the widely believed but still-unproved 
Generalized Riemann Hypothesis is true, we’d actually be guaranteed that 
every odd composite m passes a Strong Fermat Test for at least one positive 
base a <2 log?(m).) Therefore, m must be composite if there is a non-zero 


base a for which a> # +1(modm). For example, the Strong Fermat 


Swww.cse.ittk.ac.in/primality.pdf 
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Test with a = 2 proves that m = 18923 is composite, since mt = 9461 
and 2946! — 8144 in Zn. 


The Pollard Rho Method. We will now discuss one of the earliest mod- 
ern methods for factoring an integer m. The method involves a clever use 
of orbits of a polynomial modulo m, which we know must be eventually 
periodic and so can be drawn in the shape of the Greek letter p, where the 
tail indicates the pre-period of the orbit. (Refer to Figure 8.1.). The method 
is now usually called the Pollard Rho Method although in his original 
article [128] Pollard referred to the algorithm as the Monte Carlo Method 
because it is probabilistic and relies on a polynomial that has “sufficiently 
random” orbits. In practice, a quadratic polynomial is usually used unless 
some special knowledge of the divisors of m indicates that a polynomial of 
higher degree would work better. 

The method relies on the iterates of f(x) being sufficiently random that 
a generic orbit of f(a) modulo m has the property that there is a proper 
divisor d of m such that the orbit gets into a cycle modulo d well before it 
begins to repeat modulo m. Such an orbit yields a nontrivial divisor of m, 
since a = b (mod d) implies g = gcd(a—b,m) is a multiple of d, while a £ b 
(mod d) ensures that g is a proper divisor of m. So, a successful orbit is 
sufficiently random that it becomes periodic modulo some divisor d before 
it becomes periodic modulo m. In [74] Richard Guy stated that for prime 
p the orbits of f(x) = x7 +1 modulo p seem to cycle quite quickly, and he 
conjectured that the number of iterates needed to cycle is O(,/p In(p)). 

Writing the conditions for successful orbits in terms of iteration of the 
function f(a), we want the sequence of iterates to contain some s € Z, 
such that there are integers n,k (say n < k) for which 


f(s) = f(s) (mod d) 


but 
f(s) & f(s) (mod m) for all 0 < i,j <k. 


Then for A = f")(s) — f(s) (mod m), g = gcd(A,m) is a proper di- 
visor of m. Popular implementations of the method recognize that it’s 
not necessary to find the smallest values of n and k for which f‘")(s) = 
f(s) (mod d). For example, Floyd’s cycle-detecting method uses the idea 
of subtracting the “fast” sequence (f?”)(s)) from the “slow” sequence 
(f©™(s)) and then using the sequence 


ged( f(s) — fP™(s) , m). 


(Refer to the work of R. Brent and J. Pollard [13] for other refinements.) 
Here’s an algorithm: 
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PROCEDURE POLLARD (m, f(x) 
Randomly generate s. 


t= f(f(s)); d= ged(s — t,m) 


ENDWHILE 
RETURN (d) 


If POLLARD(m, f(z)) = m is returned, the procedure can be repeated 
using another choice of either the polynomial f or the seed s. 

We have already commented that m = 18923 can be shown to be com- 
posite by the Strong Fermat Test with a = 2. Let’s now use the Pollard 
Rho Method with f(x) = x? + 1 to factor m. For instance, the first few 
elements of the orbit of 2 under f(a) mod m are 


2,5, 26, 677, 4178, 8679, 11502, 5312, 3152, 530, 15979, 403, 11026, 11325. 


Applying the Pollard Rho Method gives d,, = 1 for all n < 8 and then the 
divisor dg = 127. From this example you see that the gcd sequence can 
begin with a rather long string of ones. Because of this, implementations of 
the method often take giant steps through the gcd sequence: Calculating 
several A; = f‘)(s) — f?")(s), finding their product A mod m, and then 
computing g = gcd(A,m). Since A < m, g is a proper divisor provided at 
least one A; has a nontrivial common divisor with m. 

It is useful to modify the condition in the WHILE statement so that the 
do-loop ends when it seems likely that the run will not be successful. This 
translates to having a good estimate of when the first duplication is likely 
to occur for a given modulus; that is, we want an estimate for the length 
of the letter p in Figure 8.1. In mathematical terms, the probability that 
there is a repeated element among k elements chosen from a set with n 
elements is 1 — P(k,n), where 


P(kym) = Maat to) 2) 4) 


nk n 


is the probability that the k choices are distinct. Estimating this probability 
is often referred to as The Birthday Problem, because it can be used to 
guess the likelihood that at least two people in a crowd of M people have 
the same birthday. (Refer to Exercise 8.38.) In Exercise 8.37 you find fairly 
tight bounds on this probability, which yields the following heuristic for 
terminating the loop. Remember that we don’t want a duplication mod 
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m, and from Exercise 8.37(c) we see that a duplication mod m is unlikely 
before 0.8,/m iterations. On the other hand, since m is composite, it has 
a divisor d within the interval (1,,/m), and so from Exercise 8.37(b) we 
expect a duplication mod d within 1.39\V/d steps, that is, within 1.39./m 
steps, which for large m is quicker than 0.8,/m steps. Because of this, it 
is reasonable to stop the run if it doesn’t yield a divisor within 1.39/m 
iterations. 


8.6 Exercises 


Ex 8.1. For any positive integer m, show that there are infinitely many 
Fibonacci numbers that are divisible by m. 
Hint: Consider the Fibonacci sequence mod m. 


Ex 8.2. Find all periodic and eventually periodic orbits of f(x) = —x? + 
22+1. 


Ex 8.3. (a) Show that gcd(19,157) = 1. Find an integer solution to 
19x+157y = 1 and show that there exist no positive integer solutions 
to 19% + 157y = 1. 
(b) Find the multiplicative inverse of 214 (mod 12323). 
(c) Show that there are integer points (a, y) on the line 1792+ 2351y = 1. 
Find the integer point on the line that is closest to the origin. 


Ex 8.4. For the finite field F, define n-1 to be the sum of n copies of the 
multiplicative identity 1 in F. 
(a) Show that there exists a least positive integer no such that no-1 = 0, 
the additive identity in F. 
(b) Show that np must be prime. This is called the field characteristic 
of F. 


Ex 8.5. This problem contains a justification of Montgomery multiplica- 
tion. (Refer to the procedure in Section 8.1.2.) 

(a) Write the recursions for the sequences (Q;) and (R;) . 

(b) Use congruence modulo r to show that each R; is an integer. 

(c) Use induction to show that for each 0 <i<k, 


rttR, =aptart:--4 a;r’ (mod m) . 


(d) Complete the justification by proving that r*+1.R, < 2m. 
(ec) What is the output from MULT(r**+1a, r*+1b)? 


Ex 8.6. Let F be a finite field of characteristic p. Show that F can be 
regarded as a finite-dimensional vector space over Z,, and therefore F has 
p” elements for some n > 1. 
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Ex 8.7. Let F be a finite field with g = p” elements. Show that for any 
f € Fla], (f(x))? = f(x?) and that (f(«))? = f(«%) follows by induction. 
Hint: Use the Binomial Theorem. 


Ex 8.8. Let F be any finite field and n be a positive integer. Let V be the 
set of all polynomials in F[2] of degree less than n. 
(a) Show that V is an abelian group under the operation of polynomial 
addition in F[z]. 
(b) Show that V is a vector space of dimension n over F. Therefore, the 
number of elements in V equals gq”, where q is the number of elements 
in F. 


Ex 8.9. Let g(x) € F[z] be a fixed polynomial, with n = deg(g). Let V 
be the set of all polynomials in F[{z] whose degree is less than n, which 
was shown to be a vector space in the last exercise. Impose still more 
structure on V by defining an additional operation x as follows. For any 
a(x), b(a) € V we use the Division Algorithm for polynomials to obtain 
q(x), r(x) € F[z] such that 


a(x)b(a) = q(x)g(x) + r(x) where deg(r) <n, 


and define the operation * on V by a(x) * b(x) = r(x) € V. Show that 
x is an associative commutative operation on V that is distributive over 
the usual addition defined as in Exercise 8.8. Is V always a field under the 
operations of + and x? In what follows we use the notation F[z]/(g(«)) for 
V. 


Ex 8.10. Let F = Z; and g(x) = (x7 +1)(2? + x +1). Show that neither 


x? +1 nor 23+ 2+ 1 has a multiplicative inverse in F[2]/(g(2)). 


Ex 8.11. Let F be any finite field and let g(x) be a reducible polynomial 
in F[a]; that is, there exist polynomials a(x), b(a) € F[a] such that g(a) = 
a(x)b(a) with 1 < deg(a), deg(b) < deg(g). Show that neither a(x) nor b(x) 
has an inverse under x in F[z]/(g(2)). 


Ex 8.12. Let F be any finite field. Let g(x) be a polynomial that is ir- 
reducible in F[a]. Verify that L = F[z]/(g(a)) is a finite field under the 
operations of addition and x. 


Ex 8.13. Verify that x? +1 is irreducible in Z3[z] and use it to construct 
a field with nine elements. 


Ex 8.14. In Section 8.2 we defined an abelian group. When the requirement 
of commutativity is removed, the structure is called a group. Let R be 
either a finite field or Z,, for some m > 2. 
(a) Show that the set S of all 2 x 2 invertible matrices with entries in R 
is a finite group under multiplication. 
(b) Show that for any A € S' there exists a positive integer / such that 
A* = T, which is called the order of A in the group S. (For this, you 
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might mimic the argument we used to prove that every invertible 
element of Zm has an order.) 


Ex 8.15. Find the orbits of x = 3 and x = 4 under S(x) = 12” +4 on 
R= Zo,. Formulate a conjecture about the periods of all orbits under this 
map. Check your conjecture by computing the orbits. 


Ex 8.16. Suppose F,K are finite fields with F C K, and let q be the 
number of elements in F. For a € K, show that a € F iff at =a. This isa 
generalization of Theorem 8.3.6. 


Ex 8.17. Let S,, denote the set of all permutations of n elements, where a 
permutation means an ordering (or bijection) of the integers 1,2,...,n. 
Show that S,, is a group under composition of functions. What is the iden- 
tity element in this group? 


Ex 8.18. Now consider the action of shuffling a deck of n cards. Every 
shuffle of n cards can be viewed as an element of S,, and so every shuffle 
has an order in the group S,,. What is the practical meaning of the order 
of a shuffle? 


Ex 8.19. For this problem consider Sj2, the group of shuffles of a deck with 
twelve cards. We’ll call a shuffle perfect if the shuffle begins by splitting the 
deck into two equal piles and then alternates between the two piles with the 
card on the bottom of the original second pile becoming the bottom card. 
(This is also called a riffle shuffle or riffling.)For instance, an example of 
a perfect shuffle of six cards changes 1,2,3,4,5,6 into 4,1,5,2,6,3. 

(a) Show that after i perfect shuffles, the original first card is in the 2* 
position mod 13. After how many shuffles is the first card returned 
to its original place? 

(b) What is the order of a perfect shuffle in Sj? 


Ex 8.20. What is the order of a perfect shuffle in S52? 


Ex 8.21. (This problem is based on the work of Joseph Keller in [83].) 
Consider a riffle shuffle of a deck with & cards. By this we mean cutting 
the deck once at random and then riffling together the two parts formed 
by the cut. For this we assume that cutting satisfies a uniform distribution, 
that is, the probability of any cut is 1/(k — 1). Let p, be the probability 
that the original bottom card is on the bottom of the deck after n riffle 
shuffles. 
(a) Show that po = 1 and for all n > 0, 


a 1 
Pn+1 = 5 (pn + (1 — Pn)z—) : 


(b) Use the theory of linear recurrences to show that 


_1,k-1f k-2 " 
ie ar a 1 rh ea 
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Ex 8.22. For this exercise use F = Z, where p equals the prime 5009. 
Without calculating any orbits, find the period of « = 0 under each of 
f(x) = 8x41; f(z) = 40 +7. 


Ex 8.23. Use induction to show that the number of solutions to any poly- 
nomial equation with coefficients in a field is bounded by the degree of the 
polynomial. 


Ex 8.24. For this exercise, use F = Z,, where p equals the prime 1361. 
Check that 3 is a primitive element in F. Find a first-order linear recurrence 
mod p whose period is 85. 


Ex 8.25. Let A be the companion matrix for the Fibonacci recurrence 
mod 6. 
(a) What is the order of the matrix A? 
(b) Let (s,) be the sequence that satisfies the Fibonacci recurrence mod 6 
and has initial state vector (3,3). Find the period of this sequence by 
considering the sequence mod each of the primes 2 and 3. 


Ex 8.26. (a) Use induction to show that the Fibonacci sequence (f;) 
satisfies 


fon > Fal 2picet a fn) and fon+1 = ie ar fake 


Hint: Verify the two identities in tandem. 
(b) Use part (a) to show that the Fibonacci period mod 2" is 3-2”. 
(c) For this problem let F be the companion matrix for the Fibonacci 
recurrence. Show that F? = [+2F and use this to find the period of 
the Fibonacci sequence mod 2”. 


Ex 8.27. Let R be either Z,, or a finite field. Let snip = CiSn4r—1 +--+ 
CkSy, be a linear recurrence in R with cz, an invertible element of R. 
(a) Show that the companion matrix is invertible. 
(b) If the first k state vectors form a spanning set for R*, show that the 
period of any non-zero solution to the recurrence equals the order of 
the matrix. 


Ex 8.28. Let F be a finite field and (s,,) a non-zero sequence that satisfies 
a homogeneous second-order recurrence in F with so # 0. 

(a) Show that the period of (s,,) is less than the order of the compan- 
ion matrix of the recurrence iff ssp ' is a root of the characteristic 
polynomial of the recurrence. 

(b) If F = Z, for some prime p = +3 (mod 10), show that the period of 
any non-zero solution to the Fibonacci recurrence is the order of the 
companion matrix. 


Ex 8.29. For any m > 2, let X = Z?, and f(21,22) = (%2,21 + 42 mod 
m). 
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(a) If t is the period of the usual Fibonacci sequence (f;), show that 
fi-j = (—1)’ f; for all 0 < j <t. 


(b) If the period of (a,b) € X under f is odd, show that m = 2 must 
hold. 

(c) Show that (0,0) is the only fixed point of f. 

(d) Show that no orbit of f has period equal to 2. 


Ex 8.30. (a) What is the largest period for a sequence that satisfies the 

Fibonacci recurrence mod 5? 

(b) How many different sequences satisfy the Fibonacci recurrence mod 
5? 

(c) Show that there is only one non-zero Fibonacci orbit mod 3. 

(d) Without calculating any sequences, show that every Fibonacci period 
mod 7 must divide 16. 

(e) Find a Fibonacci period mod 7 that equals 16. 


Ex 8.31. (a) Factor 2? — 2-1 in Zy,[z]. 
(b) Without calculating the actual orbit, find the period of the orbit of 
(0,1) under the Fibonacci recurrence mod 11. Check your answer by 
calculating the orbit. 


Ex 8.32. This is a problem from the American Mathematical Monthly, 
March 1992, page 278. 

(a) Given a positive integer m, show that the modular Fibonacci pe- 
riod t(m) satisfies t(m) < 6m for all m and that equality holds for 
infinitely many m. 

(b) Show that an analogous result holds for the Lucas sequence with the 
upper bound of 6m replaced by 4m. 


Ex 8.33. Find the period of 0 under f(x) = 327! — 2 in Z1, (where 07! 
is defined to be 0). Find an inversive generator in Z1; whose period is 11. 


Ex 8.34. Count the number of reducible quadratic polynomials in a field 
with q elements, and use that information to show that every finite field 
has at least one irreducible quadratic polynomial. 


Ex 8.35. (a) Use a Fermat Test to show that 2047 = 2!! — 1 is a com- 
posite number. 
(b) Show that m = 561 is a Carmichael number. 
(c) Prove: If m fails the Fermat Test for a = 2, then N = 2” — 1 fails 
the Strong Fermat Test for a = 2. 


Ex 8.36. (a) Show that the graph of H(x) = —In(1 — z) lies above the 
line y = x by proving that H(z) is an increasing function on [0, 1), 
which is concave upward and satisfies H(0)=0 and H’(0) = 1. 

(b) Show that 
k-1 1 k-1 k(k-1 
H(-) +...+H(=—) Rr re a lea 


m m m 2m 
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(c) Use the monotonicity of H(x) and integration by parts to show that 


H(0)+H(—) +--+4(5—) < 


m 


k? 

orn . 

Ex 8.37. For any fixed integer m > 2 and any 1 < k < m, let P(k,m) 
be the probability that k different elements are chosen from a set with m 


elements. 
(a) Use the last problem to show that 


(8.10) exp(-*) < P(k,m) < exp(-ME9). 


where exp(x) = e”, the usual exponential function. 
(b) Prove: If k > 1.39,/m, then P(k,m) < 3. 
(c) Prove: If k < 0.8,/m, then P(k,m) > 4. 


Ex 8.38 (The Birthday Problem). Estimate the probability that at 
least two people in a crowd of M people have the same birthday. Estimate 
the number of people needed to ensure that this probability is greater than 
1/2. What size crowd ensures that the probability is greater than 2/3? 


Ex 8.39. Using an analysis similar to Exercise 8.36, show there exists a 
constant c such that for sufficiently large n, the partial sum wat is 
approximated by In(n) + c. 


Ex 8.40. Let m > 2 be an integer and f a function defined on Z" for 
some k > 1 and s € Z*. Then (f‘*)(s)) is periodic mod m iff there exists 
a positive integer n such that 


(fO)(s)) = (f™(s)) (mod m). 


Ex 8.41. Factor m = 7031 using the Pollard Rho Method with the orbit 
of 3 under f(z) = 27 +1. 


9 


Computational Complexity 


Analysis of algorithms is intimately related to recurrences. In this chapter 
we present many algorithms that are recursive in the sense that they call 
themselves. We’ll see that the analysis of each algorithm quickly leads to 
a recurrence that we can solve using the techniques of the previous chap- 
ters. The solution of the recurrence then provides information about the 
amount of resources used by the algorithm. We will also see that many 
easily stated unsolved problems are close to the edge of standard material. 
The analysis and improvement of basic algorithms provides a treasure chest 
of research problems that are fun (and maybe even profitable) to solve and 
are accessible to students. 

An algorithm is a procedure that solves a problem and is suitable for im- 
plementation as a program on a digital computer. This informal definition 
makes two important points. First, an algorithm solves a problem. There 
are computer programs that never terminate, and it would be very difficult 
to say whether such a program does anything, let alone solves a problem. 
Second, each step of the algorithm should be well-defined and should be 
representable, at least in principle, by a program. For example, s := p/q is 
not well-defined if q is allowed to be zero. Also, “Find the smallest 7 € X 
for which the statement P() is true” is not necessarily well-defined, since 
it depends on the truth value of the statement P(«) and the set X. For 
instance, if P(x) were true for all negative integers, then “Find the smallest 
x € X for which the statement P(x) is true” is not well-defined for X = Z 
but is well-defined for X = N. 

For our purposes and for many purposes, the above somewhat informal 
definition of algorithm is sufficient. Formal definitions of the term “algo- 
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rithm” were created by Turing, Markoff, and others (refer to [138]). It’s 
very satisfying that all these formal definitions of algorithm define the same 
class of algorithms: When A is an algorithm in one of these senses, then in 
every one of the other senses there exists an algorithm that is equivalent 
to A. 


9.1 Analysis of Algorithms 


The analysis of an algorithm attempts to assay the amount of resources 
used by the algorithm. For any solvable problem there are an infinite num- 
ber of algorithms that solve the problem, so how do we decide which is 
the best algorithm? An obvious idea is that best means uses the fewest 
resources. Typical resources are time and space, and in this chapter our 
analyses concentrate on time. 

An abstract world of abstract computers and abstract programs is con- 
structed from the real world of actual computers and actual computer 
programs. This construction is rarely formal, because exact definitions of 
abstract entities are often not stated. A number of real-world limitations 
disappear in the abstract world. For example, real computers have a fixed 
finite memory size and there’s an upper bound on the size of numbers that 
can be represented by the computer. In the abstract world these limitations 
don’t exist. There, computers are assumed to have finite but unbounded 
memories with no bound on the size of numbers that can be used. In prac- 
tice, there are examples of algorithms that work relatively quickly when 
arbitrarily large numbers are used, but implementing them on real com- 
puters results in much slower algorithms. These algorithms make perfect 
sense in the abstract world, but have little or no relevance for the real 
world. 


9.1.1 Measuring run time 


We want to know how much time it takes an algorithm to perform a partic- 
ular task. For a real computer program this can be done using a stopwatch 
to time the execution of the program. This elapsed time is often called the 
wall clock time. Another way of timing a real program is to use your 
computer’s TIME command so that the computer types out the time used 
when the program is run. This time measure is often called CPU! time 
and may differ drastically from wall clock time. 

Both wall clock time and CPU time suffer from the real-world problem 
of inexact repeatability. Two different runs of the same program may not 
take the same amount of time, although it is certainly possible to gather 


1CPU means Central Processing Unit. 
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valid statistical data. A more serious problem is that both wall clock time 
and CPU time are highly dependent on the exact hardware and software 
implementation used, as well as on the input data. Specifically, if you change 
operating systems or run the program on a different computer or change 
the input data, you will often be unable to reliably predict how long the 
“same” program will take to run. 

We want to call an algorithm faster (it uses less time) than another 
algorithm if when we run the two algorithms on a computer the faster 
one always finishes first. To make this a fair test some variables have to 
be removed. For example, we’d have to code the two algorithms in the 
same programming language; compile the two programs using the same 
compiler; run the two programs under the same operating system on the 
same computer and not interfere with either program while it’s running. 
In practice, even if we could control all these conditions, to our chagrin we 
might find that algorithm A is faster under conditions C, while algorithm 
B is faster under conditions D. 

To avoid this unhappy situation we calculate unitless time. For this we 
find the run time T(n) as a function of n, where n is some measure of the 
size of the problem. For example, we could use the number of digits as the 
measure of problem size if the problem is addition of two integers. We could 
use the number of elements in a list if the problem is to sort a finite list. 
We could use the number of edges (or the number of vertices) in a graph if 
the problem is to determine whether a finite graph has a certain property. 
We consider two algorithms to use the same time if their run times have 
the same order. For our purposes, two run times T;(n) and T2(n) in the 
variable n have the same order when T\(n) = O(T2(n)), where Big-Theta is 
the notation defined in Chapter 1; namely, the statement T\(n) = O(T2(n)) 
means that there exist positive constants c,, cz such that for all sufficiently 
large n, 

c1T\(n) = To(n) < c2T\(n) . 


In particular, “having the same order” doesn’t distinguish between algo- 
rithms whose run times are constant multiples of each other. 

It’s worthwhile to discuss Big-Theta notation further. Assume that A is 
an algorithm and that the size of the input data for A is represented by the 
variable n. A typical result might be that A has run time O(n?). Since we’re 
using unitless time, we have no idea what this means in terms of seconds or 
nanoseconds. Indeed, we can think of the time unit as an unknown function 
of many details, among them the machine and the programming language. 
For example, the fact that the run time is O(n”) allows us to reasonably 
predict that when all these details are kept constant, doubling the size of 
the input will quadruple the run time of A. 

There is even more hidden in 0-notation. Because it’s asymptotic, the 
above prediction might not be valid unless n is quite large. For example, the 
actual CPU time for n = 2 can be the same as for n = 1 when the output 
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for these values is computed by a table lookup. Further, for some programs 
the squaring prediction might be valid for n > 100, whereas for others the 
prediction might be valid only for n > 1024 or some larger number. 

So how is the measure of run time used to compare two algorithms? 
Assume for the moment that algorithm A has run time O(n?) and that 
algorithm B has run time O(nlog n). If we program these two algorithms, 
will the program for B be faster than the program for A? Yes, but only 
for large enough n. Depending on the constants implicit in the O notation, 
it may be that B is faster than A only for n > 107°, in which case for 
reasonably sized data the “slower” A might actually be faster. 

To summarize, if we find that the order of the run time for algorithm A 
is strictly less than that for algorithm B, then we can be confident that for 
any large enough problem A will run faster than B. On the other hand, if 
the run times of A and B have the same order, then we won’t be able to 
predict which one will be faster for any given input. 


9.1.2 An example: The Towers of Hanoi puzzle 


In this section we illustrate the above discussion by looking at a concrete 
example, the run time of an algorithm for solving the Towers of Hanoi 
puzzle. 

Ball’s Mathematical Recreations and Essays [5] contains one of the first 
mathematical formulations of the Towers of Hanoi puzzle. The puzzle con- 
sists of three towers or pegs (usually called A, B, C), and n disks of different 
sizes (numbered 1 through n) such that the i‘ disk is larger than the j*® 
disk whenever i > j. Initially, the disks are stacked on Tower A in order 
of size, with the largest disk on the bottom and the smallest on top. The 
problem is to move the stack of disks from Tower A to Tower C, moving 
the disks one at a time in such a way that a larger disk is never stacked 
above a smaller disk. (Refer to Figure 9.1.) An extra constraint is that the 
sequence of moves should be as short as possible. An algorithm is therefore 
said to solve the Towers of Hanoi problem if when we input the number of 
disks and the names of the three towers, the algorithm returns a sequence 
of moves that conforms to the above rules. 


n- 1 n- 1 
n n 


Initial Configuration Final Configuration 


FIGURE 9.1. The Towers of Hanoi Puzzle. 
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Three simple observations form the key to the problem. The first is that 
moving the largest disk requires all other disks to be out of the way; that 
is, the other n — 1 smaller disks must be located (in the proper order) on 
some other tower. In order for the final result to be stacked on Tower C, 
Tower C must be free for the largest disk, and so the n—1 disks have been 
moved from Tower A to Tower B, which is the Towers of Hanoi puzzle for 
n—1 disks. After the largest disk is moved to Tower C, the other n—1 disks 
must be moved from Tower B to Tower C, which is again the Towers of 
Hanoi puzzle for n—1 disks. This recursive procedure leads to the following 
recursive algorithm. 


PROCEDURE HANOI(A, B,C,n) 
IF n=1 THEN Move the top disk from A to C 
ELSE HANOI(A,C, B,n—1) 


Move the top disk from A to C 
HANOI(B,A,C,n—1). 


This is called a recursive procedure because it calls itself. Also, unlike the 
worm Ouroboros, it doesn’t endlessly swallow its own tail. Each time it calls 
itself it decrements the size parameter n by 1, which means that eventually 
the sequence of calls “bottoms out” with a call to HANOI with n = 1. This 
call makes one move and then returns to the previous call. The operation 
of this algorithm can be seen in Figure 9.2, where we give the sequence of 
calls and the states of the puzzle for n = 3. 

There’s one point in the trace that might seem at least slightly strange: 
How did the Move from A to C in the algorithm give Move from C to B 
in the trace? The answer is that “A” in the algorithm refers to the first 
parameter in the call, and “C” refers to the third parameter in the call. So 
when Move from A to Cis referred to within a call to HANOI(C,A,B,n), it 
makes a move from C' to B in the execution of the solution of the puzzle. 

Now that we have an algorithm for solving the Towers of Hanoi puzzle, 
let’s analyze its run time. Because we’re calculating unitless time, we don’t 
need to know how long it takes to perform operations such as moving a disk 
or issuing a procedure call. We do have to distinguish between operations 
that take a constant amount of time (that is, time that is independent of the 
value of n) and operations whose run time depends on n. Aho, Hopcroft, 
and Ullman in [2] call the assumption that each operation takes the same 
amount of time the uniform cost criterion. Under this condition the run 
time T(n) used for n disks satisfies the first-order recurrence 


(9.1) T(n)=2T(n—-1)+c foralln>1, 


where c is a positive constant. This is true because within the procedure 
for n disks there are two calls to the procedure for n — 1 disks, and the 
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r— HANOI(A,B,C,3) 


>— HANOI(A,C,B,2) 


>— HANOI(A,B,C,1) 


——— Move from A to C 


Move from A to B 


>— HANOK(C,A,B,1) 


——— Move from C to B 


Move from A to C 


r— HANOI(B,A,C,2) 
r— HANOI(B,C,A,1) 
———— Move from B to A 


Move from B to C 


>— HANOI(A,B,C,1) 


———— Move from A to C 


1 
2 
3 
A B Cc 
2 
3 1 
A B Cc 
3 2 1 
A B Cc 
1 
3 2 
A B Cc 
1 
2 3 
A B Cc 
1 2 3 
A B Cc 
2 
1 3 
A B Cc 
1 
2 
3 
A B Cc 


FIGURE 9.2. Trace of the program HANOI for n = 3. 
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constant c is the sum of the constant run times for the various operations. 
From the work in Chapter 3 we know that (9.1) has the solution 
T(n) =T(1)2""1 + (2% 1-1), 
and so ral 
PO 9" < 11) < 
which gives T(n) = 0(2”). 

Instead of assuming the uniform cost criterion, we could have assumed 
that the run time is proportional to the number of moves. Under that as- 
sumption, our argument could proceed as follows. Let M(n) be the number 
of moves performed when the algorithm is called with n disks. Then 


M(1)=1 and M(n)=2M(n—-1)+1 foralln>1, 


which has the solution M(n) = 2" — 1. Because the run time T'(n) is 
proportional to M(n), then T(n) = O(M(n)) = 0(2"), which is the same 
as we obtained above. 

In our analysis of the algorithm HANOI we’ve made several other assump- 
tions that bear examination. In any actual implementation on a real digital 
computer, the number n needs to be stored and manipulated internally. The 
time required for this is at the very least dependent on the number of com- 
puter words required to represent n, but our argument assumed that each 
operation takes a constant time, independent of n. In addition, representing 
a state of the puzzle in a computer’s memory requires increasing memory 
as n increases and, presumably, increasing time to manipulate this memory. 
All of the operations in HANOI depend on the space needed to store n in 
memory, and this is proportional to the number of bits needed to represent 
n, which is [log, n]. We can ignore the ceiling function and consider space 
to be proportional to logn, where the constant of proportionality depends 
on the word size of the computer. Aho, Hopcroft, and Ullman [2] call this 
the logarithmic cost criterion and suggest that it should be used when 
the numbers in the algorithm don’t have fixed bounds. Using this criterion, 
the run time for the n-disk puzzle is 


T(n) = 2T(n— 1) + clog(n), 


with initial condition T(1) = t;. (As above, c and t; are unknown positive 
constants.) Again referring to Chapter 3, the solution to this recurrence is 


Bote bey ae . 


where each summand is positive, and the sum is therefore less than the 


infinite sum 
log a log 
0< : 
=. 3 a 


i=1 


T(n) = 2” 
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which converges by the Ratio Test. The sum is bounded above and below 
by positive constants, and 


C12” < T(n) < c2”, 


where for instance c1, c2 can be chosen as cy = a and cg = 2+ey er, ont 
So again T'(n) = 0(2”). The increase in operation time due to the size of n 
therefore has no effect on the order of our estimate of the run time, because 
any changes were absorbed into the “implied constant” in 0(2”). 

If we consider building a physical Towers of Hanoi puzzle and the time 


to move disk n grows as some function g(n), we would have 
T(n) = 2T(n—1)+g(n). 


Provided the eminently reasonable assumption that g(n+1) < 2g(n) holds 
for all sufficiently large n, a modification of the above argument again yields 
T(n) = 0(2”). From this we see that the conclusion that HANOI has 0(2”) 
run time is robust: Specific details of the implementation of the algorithm 
have no effect on the run time. Table 9.1 gives some actual run times and 
compares them to the predicted run time c2”, where c is computed as 

= T(10)/2!°. Notice that the ratio T(n)/T(n — 1) is approximately 2 
which is consistent with T(n) = 0(2"). 


TABLE 9.1. Run times for the HANOI Algorithm showing the predicted 0(2”) 
behavior. 


Run Times for the HANOI Algorithm 
a Number of Moves Predicted a 


A ED 
pie | see | 


While our recursive algorithm solves the Towers of Hanoi problem, there 
are many other algorithms that also solve this problem. For example, Bune- 
man and Levy [19] give a compact iterative algorithm for Towers of Hanoi. 
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Among the variety of algorithms for this problem, is there a best algorithm? 
We don’t want to go deeply into this question, but obviously the answer 
to this question depends on the definition of best. For Towers of Hanoi, it 
can be shown that every algorithm must use at least 22(2”) time. So, if best 
means least-time order, our recursive algorithm is best. But the Buneman 
and Levy algorithm with a reasonable data structure can also be shown to 
have 0(2") run time and so is another best algorithm. So, there can be more 
than one best algorithm. Cull and Ecklund [41] consider a variety of Towers 
of Hanoi algorithms and show that every Towers of Hanoi algorithm must 
use at least 0(2") time and at least n — c bits of space, for some constant 
c. They give an algorithm that simultaneously achieves these lower bounds 
on time and space. 

We close this section with a problem from the Advanced Problem Section 
of the June-July 1939 issue of the American Mathematical Monthly (page 
363), in which B.M. Stewart proposed a generalization of the Towers of 
Hanoi puzzle to any number k > 3 of towers. 


Given a block in which are fixed k pegs and a set of 

nm washers, no two alike in size, and arranged on one 

peg so that no washer is above a smaller washer. What 

is the minimum number of moves in which the n washers 

can be placed on another peg, if the washers must be moved 
one at a time, subject always to the condition that no 
washer be placed above a smaller washer? 


Two years later, two solutions to the problem were published in the Ad- 
vanced Problems Section of the March 1941 issue of the same journal (pages 
216-217). These solutions yield algorithms for solving the Towers of Hanoi 
puzzle for k towers. Many people believe that these are optimal, but a proof 
of optimality is still an open question after 60 years. We close this section 
with a comment from a March 2002 interview with Donald Knuth [87, p. 
321]. “In the case of the 4-peg ‘Tower of Hanoi’, there are many, many ways 
to achieve what we think is the minimum number of moves, but we have 
no good way to characterize all solutions. So that’s why I personally came 
to the conclusion that I was never going to solve it, and I stopped working 
on it in 1972. But I spent a solid week working on it pretty hard.” 


9.2 Computer Arithmetic 


Binary representation is usually used to store an integer on a digital com- 
puter, and the number of bits for the representation of N is [log,(V)]. This 
is then broken into blocks that fit into the internal words of the computer’s 
memory and that are stored as some sort of list. Every machine has a limit 
on the size of integers that can be represented in this fashion. In many 
classical integer data types this was limited to one or two machine words, 
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whereas many present-day applications require no pre-set bounds on the 
size of integers. 

In any storage scheme there are details that are specific to the machine 
and the operating system. Among these are the word size and the manner in 
which the list is stored. In analysis of algorithms these details are avoided by 
considering the number of bit operations required to perform a calculation. 
A bit operation is either a unary operation (an operation performed 
on one bit, such as reversing a single bit) or a binary operation (an 
operation on two bits, such as addition). We assume that the run time is 
proportional to the number of bit operations. In order for this to make 
sense, the operations on words must introduce a constant multiple to the 
run time, and our run time estimates are not dominated by the overhead 
required for keeping track of the list structure. 


9.2.1 Addition and subtraction 


How long does it take to add two n-bit natural numbers M and N? That 
is, how many bit operations are required to add the integers? For instance, 
consider the algorithm we all use for adding two binary representations by 
hand. The maximal number of bit operations for the sum is n additions 
with carries. If we write the carry bits in a row above the rows containing 
M and N, this row always has a zero in the rightmost bit and extends 
at most one place to the left. Taking this into account, at most 2n bit 
operations are used to perform the addition of two n-bit integers. 

On the other hand, it takes at least n bit operations to write down the 
number M + N. This is true even in the extreme case in which one of the 
summands is zero and we know a priori that the sum is just equal to the 
other integer. Because of this, n is a lower bound on the number of bit 
operations needed to add M and N. Since M — N = M + (—N), the cost 
of subtracting N from M equals the cost of adding N to M plus the cost 
of negating NV. Since negation can be accomplished in n bit operations, we 
conclude that the time required to add or subtract two n-bit integers is 
O(n). 


9.2.2 Multiplication and division 


The standard algorithm used for hand computation of the product M * N 
in binary involves multiplying M by each bit of N (resulting either in 0 
or M in binary), stacking up the results with appropriate left shifts, and 
then adding entries in the stack. For example, to compute 7 * 5 = 35 the 
standard algorithm is 
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1 11 
* 1 0 1 
1 1 1 
0 0 0 
11 1 


1 0 0 0 1 41 


When M and W are n-bit integers, this algorithm requires n additions of 
integers that have at most 2n bits, and the complexity of this standard 
algorithm is O(n”). 

This is not the best algorithm for integer multiplication, since there’s 
an easily implemented algorithm that has complexity @(n!°%2()) (refer 
to Exercise 9.18). There’s also another algorithm, due to Schénhage and 
Strassen [145], which has asymptotic complexity O(nlog nlog(log n)) and 
therefore its run time has order only slightly larger than the order of ad- 
dition. However, because the constant involved in the Big-Oh notation is 
large, the Schonhage-Strassen algorithm isn’t better than the more straight- 
forward algorithm until n is quite large. 

What about the cost of division? Later in this chapter (section 9.4.4) we 
will use Newton’s method to design an algorithm for division that has the 
same run time as multiplication. 


9.3 An Introduction to Divide-and-Conquer 


Historically, the idea of divide-and-conquer is often attributed to Julius 
Caesar [173]. In the context of problem solving, divide-and-conquer may 
reasonably be attributed to René Descartes [49, 50], the same Descartes 
whose Rule of Signs was used in Chapter 5 to prove that every nonnegative 
polynomial has only one positive root. The keystone of Descartes’ analytic 
method is to break a complicated problem into easier constituent parts and 
then to solve the individual parts. An understanding of the complicated 
problem is then built up from the solution of its parts. In How to Solve 
It [129], George Pélya stressed that breaking down a problem into several 
smaller problems of the same kind is a typical step in solving a mathemat- 
ical problem, and he further pointed out that this sort of analysis easily 
leads to an inductive proof of the correctness of the solution. In the 1930's, 
philosophers such as Gédel, Kleene, and Ackermann recognized the central 
role of this recursive technique, and their theoretical analyses led Turing 
to describe an abstract digital computer. Eventually, Turing machines 
were embodied as physical digital computers, and the problem of program- 
ming these computers led back to the divide-and-conquer technique. (For 
example, refer to [172] and [102].) 

The term divide-and-conquer algorithm is now usually reserved for 
the design strategy that solves a problem of size n by solving several prob- 
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lems of size n/c for some constant c, so that in its current usage the “di- 
vide” means that there is an actual division of the problem size. The HANOI 
algorithm that we discussed in Section 9.1.2 is not a divide-and-conquer al- 
gorithm in this sense because it uses subproblems of size n — 1. (Perhaps 
a better term for this type of algorithm is subtract-and-conquer?) Many 
commonly used divide-and-conquer algorithms split the problem into two 
subproblems, each having half the size of the original problem. Because 
such an algorithm cannot divide indefinitely, we specify a size limit above 
which the problem is divided and below which the algorithm calls another 
algorithm that solves the small-size instances. The solutions are then com- 
bined to give a solution to the original problem. 

The recursive structure of a divide-and-conquer algorithm leads directly 
to an inductive proof of correctness and also to a recurrence for the run 
time of the algorithm. 


9.3.1 Example: Polynomial multiplication 


Let’s look at an example that makes this discussion more concrete. Our 
example is polynomial multiplication (or convolution), where the 
output is the one polynomial that is the product of two input polynomi- 
als. We assume an arithmetic model of computing, which means both 
that our computer can perform arithmetic operations and that we want an 
algorithm that multiplies the two input polynomials using only arithmetic 
operations on the coefficients. As above, the run time for an algorithm is 
the number of arithmetic operations used in the algorithm. 

Assume that each input polynomial has degree n — 1 and hence n co- 
efficients, which we record in a vector with n components. The output 
polynomial then has degree 2(n — 1), recorded as a vector with 2n — 1 
components. The standard algorithm for multiplying polynomials is sim- 
ilar to the standard algorithm for multiplying integers. If P(a) and Q(z) 
are the input polynomials, multiplication proceeds by multiplying P(a) by 
the constant coefficient of Q(x), shifting one space to the left, multiplying 
P(a) by the linear coefficient of Q(x), and so on. This results in n vectors, 
which are added componentwise to get the final result. Here’s an example 
that multiplies two quadratic polynomials in this fashion: 


Qa7 + 32 -— 7 
322 —- «=n SO + 
de> ++i (‘i !”*C«*zC 
— Av? -— 627 4+ = 14¢ 
6x4 + 9a Q1x? 
6a +) Oe ”~SCBe? Ci tCOSC”~SC«i«d2' 


This method for multiplying two polynomials has run time O(n?) because 
each of the n? table entries is a product of two coefficients and each table 
entry occurs once as a summand. 
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An algorithm for polynomial multiplication 


FOR i1:=0 TO n—1 DO 


FOR j:=0 TQ n—1 DO 
Ci4+j = C45 + a4 * b; 


Let’s construct a divide-and-conquer algorithm for this problem. The 
basic philosophy of a divide-and-conquer approach to polynomial multi- 
plication is to think of the input polynomial as constructed from several 
smaller polynomials. In practice, we divide each polynomial “in half,” and 
for simplicity we first consider polynomials with an even number of coeffi- 
cients, say n = 2m. Then we can write 


P(x) = Qom—107"" 1 +++» + az + a9 = 2 Pi (x) + Po(z), 


where each of the polynomials P; (7) = dzm—10™~ 1 +++-+@m412+4m and 
Po(#) = Gm—12™~1+-++-+a12+4a9 has m = n/2 coefficients. (For example, 
if P(x) = 52° + 122? + 7x 4+ 8, then P;(x) = 52 +12 and P(x) = 7x +8.) 
Splitting both input polynomials P(x) and Q(x) of degree 2m in this way 
and suppressing the argument variable x results in 


PQ = (a™P, + Po)(2""Q1 + Qo), 
giving 
(9.2) PQ =2°PiQi +2" [PiQo + oQi] + AQo: 


Since multiplying a polynomial by an integer power of x simply shifts the 
sequence of coefficients, the original problem of finding the product PQ has 
been reduced to the four subproblems of finding the products P,Q, PiQo, 
PoQi, and PoQo. 

Because m (and also the original n) might be odd, we will need to modify 
this procedure to get a divide-and-conquer algorithm that works for any 
pair of polynomials. Only a slight modification is needed. Let n — 1 equal 
the maximal degree of the factors P(x),Q(x) and define k > 1 to be the 
least exponent such that 2* > n, that is, k = [log,(n)]. Filling in zero 
entries where necessary, we treat P(x) and Q(x) as polynomials with 2° 
coefficients. Dividing each of P(x) and Q(x) into their polynomial halves, 
we obtain four subpolynomials having 2’~! coefficients. We continue this 
subdivision process for k stages, or for as long as we have more than one 
coefficient. When we reach the stage at which each subpolynomial has one 
coefficient, we multiply the pairs of constants that remain. Equation (9.2) 
is then used to retrace the steps and arrive at the product polynomial. 

The run time analysis for this divide-and-conquer algorithm is fairly sim- 
ple. From the first division we obtain four subproblems, each a multiplica- 
tion of polynomials of degree 2*-!. The multiplications by 2?” and x?" * 
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are shifts in the coefficient vectors and are easily carried out before the 
vectors are added. Therefore, if T'(n) denotes the time needed to multiply 
two polynomials with n coefficients (where we consider n = 2"), then 


(9.3) T(n) = 4T(n/2) + bn, 


where 4T(n/2) represents the time needed to multiply the four half-size 
polynomials, and the bn term comes from the time needed to shift and add 
the results. If we were doing an exact operation count we’d write down an 
explicit value for b. However, since we don’t know the actual time required 
for coefficient addition or multiplication, we simply assume that b is some 
positive constant. 

Notice that the recurrence (9.3) is not one of our usual linear recurrences, 
because T(n) depends on T(n/2) rather than on T(n — 1). Despite this 
novelty, it’s possible to convert (9.3) to a linear recurrence. For this, we 
exploit the fact that n is assumed to be a power of 2. Introducing the new 
variable t, = T(2*), (9.3) becomes 


ty = 4t,_1 + 02" with tp = T(1), 
a nonnegative recurrence with dominant eigenvalue Ao = 4. It has the form 
th =4tp_1 + bdA*¥, where \1 < Ao, 


and Theorem 5.5.3 implies that t, = ©0(4*), giving T(n) = O(n”) when n 
is a power of two. What about other n’s? Even when n is not a power of 
two, we’ve already established that we can write the factors as polynomials 
with 2* coefficients where k = [log,(n)], and a polynomial with 2*~! can 
be treated as a polynomial with n coefficients, so we have 


TQ) = Tin) < Te"); 


and since T(2*) = O(n?), 


2 


2 
= < © (5) 2 gg T(n) < c27* < e(2n)? < 4cn? 


and T(n) = @(n?). What’s the point of developing this algorithm when 
we already have an easy-to-understand iterative algorithm with run time 
O(n)? The answer is that a further examination of this algorithm will 
allow us to a make a slight modification that speeds things up. 

Returning to the formula in (9.2) we notice that for two of the half-size 
products, P,Qo and PoQ), only their sum and not the separate products 
is needed. This is important, because there’s a way to calculate their sum 
without calculating both products! The trick is to use subtraction and the 
polynomial identity 


(9.4) PiQo + PoQi = (Pi + Po)(Q1 + Qo) — PiQi — PoQo. 
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Since each of P; and Pp has n/2 coefficients, so does their sum P; + Po. 
Similarly, Q1 + Qo has n/2 coefficients, and their product can be com- 
puted with two half-size polynomial additions and one half-size polynomial 
multiplication. Since we also need both of P,Q, and PoQo, the product 
(Pi + Po)(Qi + Qo) can be computed with three multiplications, four addi- 
tions, and two subtractions of half-size polynomials. So we’ve replaced one 
of the multiplications in the earlier algorithm with four additions and two 
subtractions. This would be a loss if these operations all had the same cost. 
But the operations are polynomial operations, and additions and subtrac- 
tions cost O(n), while multiplications (so far) have cost O(n”). Therefore, 
the gain is enormous. We obtain the recurrence 


(9.5) T(n) = 3T(n/2) + Bn, 


where the constant B in this equation is slightly larger than the constant 
b in the previous recurrence (9.3). It’s easy to check that 


T(n) = a n'ee23) _ 2Bn 


satisfies this recurrence for some constant a that depends on the initial 
condition. Therefore, T(n) = @(n!°82)), where logs(3) ¥ 1.5849, and this 
new algorithm is faster than O(n”) for large enough n. (In Exercise 9.18 
you see that this algorithm can be used to obtain the promised O(n!°82(3)) 
multiplication algorithm for integers.) 

What about the implied constant in this algorithm as compared with 
the one in the standard algorithm? Table 9.2 records a comparison of the 
run times for the two algorithms for various pairs of polynomials with n 
coefficients. In these examples we were careful to use polynomials with 
small-integer coefficients so that the efficiency of either algorithm is unaf- 
fected by the size of the coefficients. Before we use the data to estimate 
the constants, we should ask if the data is consistent with with the pre- 
dicted leading term of our run time formulas. For this, we can look at the 
ratio T(2n) /T(n). For an O(n?) algorithm this ratio should approach 4, 
while for an O(n!°223) algorithm this ratio should approach 3. The data 
shows that for n > 512, these ratios are reasonably close to their predicted 
asymptotic values. To estimate the leading constants, we can use the ratios 
T;(n) /n? and Tr(n) /n'°823. For the larger values n, these ratios settle 
down and so we can come up with approximate asymptotic run time for- 
mulas: T7(n) © .00047 n? and Tr(n) © .03 n!°823. In conclusion, this data 
shows that the “faster” algorithm will be faster, but only for rather large 
n, and the data is sufficient to give formulas which allow us to predict the 
run times of these implementations, but again only for large values of n. 


268 9. Computational Complexity 


TABLE 9.2. Comparison of the run times for two polynomial multiplication algo- 
rithms. The iterative algorithm is the classical method with run time O(n). The 
recursive algorithm is the half-size method with run time O(n'°£2*). The ratios 
of T(2n)/T(n) are very close to the predicted values of 4 and 3. The “faster” 
algorithm catches up to the “slower” method for large n. 


Run Times for n-coefficient Polynomial Multiplication. 
Iterative Time (ms) | |__| Recursive Time (ms) 
2 ee 2) ee 2 ee 


Pa096 [S000 [| anne | ——*SID 
Psi 000 Pee —C OTS 
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pares do [saTOS [CAT 


9.4 Simple Divide-and-Conquer Algorithms 


Each of the divide-and-conquer algorithms examined in the last section 
led to a recurrence relation for its run time. There was a strong similarity 
between the two arguments, which we can generalize to derive a recurrence 
for the run time of a whole class of divide-and-conquer algorithms. The 
recurrences (often called divide-and-conquer recurrences ) have three types 
of solutions, and we give examples of algorithms that illustrate each type. 

In each of our two previous examples, the problem of polynomial multi- 
plication was split into two subproblems of the same type that were half the 
original size. Here we generalize this slightly and assume that a problem of 
size n is split into several (say a) subproblems, each of size n/c for some 
constant c. For many divide-and-conquer algorithms c = 2 holds, but the 
same analysis works for general c. 

In some cases the splitting process is easy and has negligible computa- 
tional cost, but we want to allow for the possibility that splitting takes 
some time. Usually the time-consuming part of a divide-and-conquer algo- 
rithm is the combining step, when the answers to subproblems are used to 
compute the answer to the whole problem. Here is the general form of a 
divide-and-conquer algorithm: 
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The general form of a divide-and-conquer algorithm 


PROCEDURE DandC(DATA,n,SOLUTION) 
IF n is small 

THEN Solve by some special algorithm 

ELSE SPLIT(DATA,n) into (D,,n/c) 
DandC (Dj, n/c, $1) 
Dy = fo(DATA, Dy, Si) 
DandC (Do, n/c, S2) 
Ds := f3(DATA, Di, $1, Do, S2) 


Da = fa(DATA, Dy, Si, Dao, So, tae > Pa=i; Sa—1) 
DandC(Dza,n/c, Sa) 

SOLUTION: =fa+i1(DATA, Dy, Si, Dag, So, sey Da, Sa) 
RETURN (SOLUTION) 


Here we’ve allowed for the results of previous subproblems to be processed 
and then fed into the next subproblem. The function f,+1 performs the 
final combination of results to produce the solution. 

An efficient algorithm should have a run time that is bounded by a 
function that is polynomial in the size of the input or perhaps the size 
of the output. For this reason we assume that the time for splitting and 
combining is given by a polynomial in n. Since we’re interested in only 
the order of the run time, we can concentrate on the largest term of the 
polynomial and assume that the run time for the split and combine steps 
is bn™ for some positive constant b and some constant exponent m. (For 
this to be a polynomial, m must be an integer, but it costs nothing more 
to allow m to be any nonnegative real number.) 

Under these assumptions the run time for a divide-and-conquer algorithm 
that splits a problem of size n into a subproblems of size n/c is given by 
the recurrence 


(9.6) T(n) = aT (n/c) + bn™ 


(compare this with (9.3) and (9.5).) The initial condition comes from the 
work done by the .algorithm that handles problems of small size. Since the 
small-size algorithm is run only on data of bounded size, we can bound 
its run time by some positive constant. This allows us to assume that the 
initial condition T(1) is positive but otherwise unknown. 

As in the last section, we essentially use logarithms to solve the recurrence 
n (9.6). Setting n = c* and tz = T(c*), (9.6) becomes 


th = ath-1 + Lem with to > 0, 
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a nonnegative difference equation with Ao = a. From Theorem 5.5.3 we 
know that the relative sizes of a and c* yield three different orders of 
magnitude for the sequence (tx) , 


O(c*™) fora<c™, 
ipg= O(ke'™) tora =c™, 
O(a*) fora>c™. 


In order to translate this back into the run time Tn) we recall that n = c* 


and tz = T(c*), and get 


O(n™) fora<c™, 
T(n) = § @(log.(n)n™) for a=e", 
O(nlos-(2)) fora>c™. 


(The last line follows from the observation that a* = cleee(a") — cklog.(a) — 

nives-(®)_) Therefore, there are three different types of behavior that can 
YP 

occur for the run time of a divide-and-conquer algorithm. 


Run Time for Divide-and-Conquer Algorithms 


Theorem 9.4.1. Suppose that a divide-and-conquer algorithm splits a 
problem of size n into a subproblems, each of size n/c, and that the sum of 
the run times for SPLIT and COMBINE is a polynomial pm(n) of degree m. 
Assuming that T(1) is positive, the run time for the divide-and-conquer 
algorithm satisfies 


T(n) = aT (n/c) + pm(n), 


and has order 
O(n™) ifa<c™, 
T(n) = 4 O(n™log(n)) ifa=e™, 
O(nlo8e 2) ifa>c™. 


9.4.1 Example 1: A return to polynomial multiplication 


In our two divide-and-conquer algorithms for polynomial multiplication, 
c was equal to 2 because the size of each subproblem was half the size 
of the original problem. Since the split and combine operations used only 
addition and subtraction of polynomials, their combined run time was a 
linear polynomial, giving c™ = c! = 2. The straightforward algorithm had 
a = 4 > c™ while in the improved algorithm using (9.4), a = 3 > c™, 
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and the third case of the theorem applies for both. As we saw earlier, the 
algorithm with a = 3 is eventually faster than the one with a = 4, since the 
smaller value of a = 3 reduces the run time from @(n?) to @(n!°82(3)). If 
we could somehow further reduce the number of subproblems to a = 2, the 
second case would apply and we would have that T(n) = O(nlogn). No 
known divide-and-conquer algorithm for polynomial multiplication has a = 
2, but in Section 9.5 we’ll see that a completely different type of algorithm 
for polynomial multiplication does achieve the complexity of O(nlog n). 


9.4.2. Example 2: Matrix multiplication 


Another problem in which divide-and-conquer leads to a fast algorithm is 
matrix multiplication. If two n x n matrices are split into four n/2 x n/2 
matrices, we can compute the matrix product using 


Ay, Aj2||Bi Bie _ Cu Cie 
Ag, Ao2| | Bor Boo Ca, C2] ’ 


where 


Cu = Air Bi + Aig Bar , 
Cig = Air Biz + Ai2 Boe , 
Co1 = Agr Bi + Ago Bar , 
Coq = Az Biz + Aro Boe , 


which uses eight half-size multiplications. Because addition of nxn matrices 
can be done in time proportional to n?, the equation for the run time of 
this divide-and-conquer algorithm is 


T(n) = 8T(n/2) + bn?. 


This equation has a = 8, c = 2, m = 2, giving a > c™ and T(n) = 
Q(nl°82 8) = O(n), which is the same as the order of magnitude of the run 
time for the standard row-times-column algorithm for matrix multiplica- 
tion. 

A faster divide-and-conquer algorithm was designed by Strassen [156], 
using seven half-size multiplications. Strassen’s algorithm computes the 
eight products 


M, = (A12 — Az2)(Bai + Boz), Me = (Air + Ao2)(Bir + Boz), 
Ms = (Ai — Aoi)(Bi1 + Biz), Ma = (Au + Ai2) Boo, 
Ms = Ai1(Bi2 — Boz), Me = Az2(Boi — Bit), 
M7 = (Aoi + Aaa) Bir, 
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and uses them in 


Cy, = M1 + Mp—Ma+ Moe, Cio = Mat Mz, 
Co, = Mg + Mz, Co = Mz — M3 + Ms — M7, 


to obtain the product matrix. It is a simple exercise in algebra to prove that 
this algorithm is correct; the real difficulty was discovering the algorithm in 
the first place! Because addition and subtraction of matrices can be carried 
out in time proportional to n?, the run time of Strassen’s algorithm obeys 
the equation 


T(n) =7T(n/2) + bn?, 


and T(n) = O(nl°827) . 


9.4.3 Example 3: MERGESORT 


A concrete example of the second type of divide-and-conquer behavior is 
the MERGESORT algorithm. In this algorithm the input is a one-dimensional 
array with n entries, and the output is the array in which the original entries 
are rearranged into increasing order. It is a divide-and-conquer algorithm, 
with the array divided into two subarrays that are individually sorted and 
then combined to give the sorted array. The following algorithm captures 
this approach. 


PROCEDURE MERGESORT (A, 7) 
IF n>1 
THEN Split A into two arrays of size n/2, A, and A» 
MERGESORT (Aj, n/2) 


MERGESORT (Ao, n/2) 
MERGE (A, Ao) 


When n = 1 is reached there is only one item in the array, and no sorting 
needs to be done. All of the real work gets done in the MERGE algorithm, 
which combines two sorted arrays A; and A: into one sorted array A. 
Without going into the details, we'll say that MERGE can be accomplished 
with n — 1 comparisons of entries and n moves. This yields the recurrence 


T(n) = 2T(n/2) + bn, 


for some constant b. Since a = 2 and c’” = 2, we are in the second case, 
and T(n) = O(nlogn). 
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9.4.4. Example 4: Applications of Newton’s method 


For an example of the first type of behavior we use Newton’s method to 
estimate values of the reciprocal function g(a) = 1/x and the square root 
function g(a) = \/z. These are special cases of estimating functions g(x) 
at values x = A when g(2) is a one-to-one function in some neighborhood 
of A. The assumption that g(x) is one-to-one ensures that it has a left 
inverse f(x) with f(g(x)) = x in a neighborhood of « = A. We further 
assume that the left inverse is also one-to-one and twice differentiable in 
this neighborhood. These conditions hold for each of g(x) = ./z and g(x) = 
1/x on « > 0. Since A = f(g(A)), any approximation to the solution of 
f(a) — A =0 is close to g(A). 

We’ve already used Newton’s method (refer to Section 5.4) to locate the 
roots of f(a) = A by iterating 

N(a) =a- f(a)-A ; 
f'(x) 
For this to be computationally feasible, the right side should be in a form 
that’s easy to compute, which for us means that it can be computed in time 
that is polynomial in the size of the input. Under reasonable hypotheses 
(refer to [20, 155]), when a good initial approximation is used, each iteration 
of Newton’s method doubles the number of correct digits. (Also refer to 
Theorem 5.4.1.) For instance, if an approximation x agrees with the root 
on the first five bits, then N(a) agrees on the first ten bits, and applying 
the iteration another time, N‘)(x) agrees on twenty bits, and so forth. 
Therefore, for input of size n the run time Tn) satisfies the divide-and- 
conquer recurrence 
T(n) = T(n/2) + p(n), 


where the polynomial p(n) records the difficulty of computing N(x) for 
the particular function under consideration. Newton’s method is thus a 
divide-and-conquer algorithm, provided N(x) can be computed from « in 
polynomial time. 

Consider the reciprocal function g(a) = 1/x on x > 0, a one-to-one 
function that is its own inverse. Since f’(x) = —1/x?, Newton’s formula 
for the iteration is 


_lf/e-A 


= 7(2 — Az). 

In this form, only two multiplications and one subtraction (and no divi- 
sions) of n-bit numbers are required, and the run time satisfies 

(9.7) T(n) = T(n/2) + po(n), 


for some quadratic polynomial p2(n). Here a = 1, c'” = 2? = 4, and we're in 
the first case, with run time T(n) = O(n?). Since division of real numbers 
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can be performed using the algorithm implicit in | = a- i this also yields 
an order O(n?) algorithm for division. 

Finding square roots can also be performed in O(n?) time using Newton’s 
method. The one-to-one function g(x) = x has the differentiable inverse 
function f(x) = x?, which is one-to-one on x > 0. Since f’(a) = 22, the 
Newton iteration is 
z?—-A «x A/2 

MW #2 «’ 
where A/2 is an easy one-time calculation.? The complicated part is the 
division by x, for which we use the quadratic-time algorithm above. There- 
fore, the run time again satisfies (9.7) for some quadratic polynomial p2(x), 
and T(n) = O(n?). 

It’s worth noting that this discussion shows that both division and finding 
square roots can be accomplished in the same order of time as multiplica- 
tion. Although we have used the standard O(n”) multiplication algorithm, 
the same argument shows that any multiplication algorithm whose order 
is at least O(n) yields division and square root algorithms of the same or- 
der. The technique of the next section gives a multiplication whose order 
is only slightly worse than O(n), and this faster multiplication can be used 
to speed up both division and square root. 


N(a) =a2- 


9.5 The Fast Fourier Transform 


In our general discussion of divide-and-conquer algorithms we referred to 
an algorithm for polynomial multiplication that has order O(nlogn), and 
this section is devoted to a description and explanation of the method. It 
is based on the technique known as the Fast Fourier Transform, usually 
abbreviated as FFT. The FFT-based polynomial multiplication algorithm 
is designed for dense polynomials, polynomials in which almost all of 
the coefficients are non-zero. Many large-scale polynomial multiplication 
problems involve sparse polynomials and for such problems there are 
other algorithms that easily outperform the FFT. So there are practical 
issues that must be considered before the theoretically good FFT algorithm 
is used. 

The key difference between the FFT method for polynomial multiplica- 
tion and our earlier methods is that here a polynomial is represented by 
some of its functional values rather than by its coefficients. The basis for 
the technique is the fact that an (nm — 1)§*-degree polynomial is uniquely 
determined by its values at n distinct points. To see this, consider the eval- 
uation of a polynomial f(a) = ap +a1a+---+ Gy—12"—' at the n different 


?Here it’s helpful to represent numbers in binary notation, because A/2 is just a 
binary shift of A and so is computationally trivial. 
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complex numbers w = 44, A2,..-, An. Writing this as a matrix equation, we 
have 
1 1 
Mi see “As 
(9.8) (G6 ;<045 0nd) : : : = (FAs a Ae liacaxdt (An) 
ACE axe QO 
where the matrix is the Vandermonde matrix V associated with \1,...,An- 


(Refer to Chapter 2.) When the \; are distinct, we know that V is invert- 
ible, and accordingly, the coefficients of f(x) satisfy 


(9.9) (09,01,2- 50-1) = (FA), $02), f On))V 


and so the coefficients can be computed from n values of the polynomial. 
Such a process is called interpolation, and (9.9) is called an interpola- 
tion formula. Schematically, we have the bijections 


Evaluation 
Coefficients — === =~ ~——~*Vailues 
Interpolation 
defined by 
(ao, a1, +. @n—1) > (F(An), fa); ++ Fn) 


INTERPOLATION 


(ao, G1, oe 5 Gn=1) oy 


where EVALUATION and INTERPOLATION are the processes that take 
us from one representation to the other. 

For the purpose of polynomial multiplication, encoding a polynomial us- 
ing n of its functional values is superior to using its coefficients because to 
obtain the corresponding value for the product polynomial we need only 
multiply the respective values of the factor polynomials. But there are some 
problems. First, the product of two polynomials of degree n—1 is a polyno- 
mial of degree 2(n—1), and therefore the product polynomial is determined 
by 2n — 1 values. More seriously, you might remember from our discussion 
of Horner’s Method in Chapter 5 that in general it takes O(n) operations 
to perform one evaluation of a polynomial of degree n— 1 and hence Q(n?) 
operations to evaluate it n times. In fact, standard interpolation techniques 
take O(n”) operations to interpolate a polynomial of degree n — 1 through 
n generic points. What saves us here is the freedom to choose the n points 
in a way that speeds up both evaluation and interpolation. 

For example, evaluating f(0) takes no work at all, and evaluating f(1) 
takes n additions and no multiplications. Similarly, evaluation of f(—1) 
also takes no multiplications, since it’s the sum of the coefficients that have 
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even indices minus the sum of the other coefficients. When A;,...,An, are 
chosen to equal to the n*™ roots of unity, we will give algorithms for both 
EVALUATION and INTERPOLATION that have run time O(n log n). 


9.5.1 The general form of the Fast Fourier Transform 


Using terminology from Chapter 8, the set G, of n*® roots of unity is a 
group under multiplication and has n elements. The principal n‘® root of 
unity, 


2 
w =cos(@)+isin(@) for d= am ; 
nm 


has order n in this group, which means that every element of G, can be 
uniquely written as w’ for some j = 0,1,...,n—1. Since 1,w,w?,...,w"+ 
are all distinct, specializing to \4; = w—' in the Vandermonde matrix of 
(9.8) gives the n x n invertible (and symmetric) matrix 


1 1 1 

i! wt bed wrt 

1 2 yy(n—1)-2 
(9.10) V@=l_ gis y(n—1)-3 

Lo wk@-D oe. yln-2)(n=1) 


Noting that for 0 < 7,7 <n the dot product satisfies 
(ile cats Aa ye ele post ee 


from (4.36) on page 87 we have 
(ae, suas a) . (lw, aie gary = f an Js 


Therefore, the inverse of V(w) is 


1 wt eee qw(n—-1) 
eT 1 1 Ww? aoe Ww 2(n-1) 1 
Vw)yt==|, wait | = <V@), 
1 yy (n-1) see yy (n—1)? 


and 


(9.11) V@= lye) = “Vw). 
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Writing f(x) = agp + aa +--+ + @n-12""1, A = (ao,..-,@n—1)", and 
Y = (f(1), f(),..., f(w"~+))7, equations (9.8) and (9.9) become 


(9.12) VwA=Y 
and 
(9.13) A= a Vw), 


which respectively correspond to EVALUATION and INTERPOLATION. 
The very special structure of these matrices will allow us to construct quick 
divide-and-conquer algorithms when n is chosen to be a power of 2. 

Historically, the term Discrete Fourier Transform (or DFT) was used 
for the general idea described above, and Fast Fourier Transform was re- 
served for the specific implementation with n = 2". Nowadays, this distinc- 
tion is blurred, and both are referred to as the FFT. 


9.5.2 The FFT when n = 2* 


We set n = 2” (where k is a natural number), and let V, denote the 
Vandermonde matrix V(w) constructed using the n*" roots of unity as in 
(9.10). We assume that the powers of the principal n™’ root of unity w have 
been found and stored in a table. The FFT also uses the (n/2)'® roots of 
unity, which are already in the table and are found by proceeding through 
the table in steps of size two. We now establish an iterative process for 
getting V(w) from V(w?). This will allow us to construct fast divide-and- 
conquer algorithms for EVALUATION and INTERPOLATION. 

The matrices for the first, second, and fourth principal roots of unity are 
respectively 


th 


1 1 1 1 

1 1 . 1 a —-l -1 

vay=tl, ve=[} _f].mav@=|t 7p 3 
1 -—i -1 1 


Note that the first column of V(—1) is a stack of two V(1)’s, and its second 


column is ( a : ) . Interchanging the second and third columns of V (i) 
gives 

1 1 1 1 

1 -l a —1 
(9.14) 1 1-1-1 1° 

1 -1 -1 a 
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whose first two columns is a matrix stack of two copies of V(—1), and its 


last two columns is the stack ( - i; where 


a-[} 2]-f Jee 


Although it’s somewhat of a stretch, from this it can be seen that V(¢) is 
built in two steps. First construct a block matrix of four 2x 2 blocks, each of 
which is either V(i?) or a product of V(i?) and the diagonal matrix. Then 
V(i) is obtained from this matrix by performing a suitable permutation 
of columns. The surprising news is that this is the general procedure for 
passing from V(w7) to V(w)! 

In what follows, the indexing of rows and columns of 2* x 2* matrices 
will begin with 0. We want to determine the permutation of the columns of 
V(w) that is used. For k = 2, we use the binary representations of 0,1, 2,3, 


00, 01, 10, and 11, 
to define the permutation Rev2 obtained by reversing the bits, 
00, 10, 01, and 11. 


Applying Rev2 to the column numbers of V(i) interchanges the second and 
third columns and keeps the other two fixed, and this is the permutation 
we want. What happens for k = 3 is recorded in Table 9.3, where the 
first row contains the numbers 0 through 7, the second row their binary 
representations, the third row the bit reversals, and the fourth row the 
decimal equivalents of these reversed representations. Reading only the top 
and bottom rows, the permutation Revs of 0,1,...,7 swaps 1 with 4 and 
3 with 6. 


TABLE 9.3. The — reversal permutation on 0,...,7 
PF oi} 2] 3f 4] 5 
CO CO AC A 


For any k the bit reversal permutation Rev; of {0,1,...,2*—1} is ob- 
tained by reversing the bits in this way. The definition of Rev, immediately 
gives that Rev; is the identity and that 


(9.15) Rev;(j) is an even integer —> 0< j < 2". 


Application of Rev; to the column numbers of the 2" x 2” identity matrix 
gives a permutation matrix, which we’ll denote by Py. You can check 
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that the first three are 


10000000 
000010 0 0 

10 0 0 0010000 0 

1 0 0010 0000001 0 
Aol, eee nae es oe 
0001 0000010 0 
0001000 0 

00000001 


Because Rev, is the bit-reversal permutation, P; has order two in the group 
of invertible k x k matrices and is its own inverse. Applying the permutation 
matrix P, on the right of V(w) gives F(w) = V(w) Py, the matrix obtained 
by permuting the columns of V(w) by Rev,. For instance, F'(7) is the matrix 
in (9.14). 

We refer to F(w) as the Fourier matrix of size 2". Since the (i,j 
entry of V(w) is 


ie 


i 
vy = uw, 


then the (i, 7)*" entry of F(w) is 


fij = wt Reva (3) . 
We use this to show that F'(w) always has the form we’ve already noticed 
for Fi). 


Theorem 9.5.1. Let w be any primitive 2" root of unity. Then F(w) sat- 
isfies the recurrence 


F(w?) = DF(w?) 


PW) =| Pu?) —DF(w) |’ 


where D is the diagonal matrix with diagonal entries 1,w,... swe, 
Proof. The indices in the upper left quadrant of F(w) satisfy 0 < i,j < 
2-1. and (9.15) implies that Rev;(j) is even for such j. Because dividing 
Rev,z(j) by 2 removes the least significant bit, 
fig _ yt Reve (9) _ (w? jp = Gyr, 
and this proves that the upper left quadrant of F(w) is F(w?). 
In either lower quadrant, the row index satisfies 2*-! < i < 2*, which 


we can write as i= I + 2*~! for some 0 < I < 2*—!. From this, 


fig = wi Reve = yt Revels) . 2° Reva) — fp. (—1)Revald) | 


280 9. Computational Complexity 


Because Rev;(j) is even in both left quadrants, this proves that the two 
left quadrants are the same and also that the matrices that form the two 
right quadrants are negatives of each other. 

It remains to show that the upper right quadrant is DF(w?). In that 
quadrant the indices satisfy 0 < i < 2-1! < j, and j = J +2*! for some 
0< J < 2*-1, Since Revz(j) = 2Reve_i(J) + 1, 


fig = yyi(Reve (9) .y—4 = wi(Reve (9)-1) = (Gea) 
we 


which is the (i, J)** entry of F(w?). This completes the proof. O 


a 


We will next show that the recursive structure of this Fourier matrix 
yields a divide-and-conquer strategy for both EVALUATION and INTER- 
POLATION. 


9.5.8 Fast evaluation and fast interpolation 


Recording the polynomial f(a) = ag+air+-+-+an-12"~ 1 as the coefficient 
vector A = (ao,---,@n—1)7, we recall that (9.12) can be used to obtain the 
evaluation vector Y = (f(1), f(w),...,; f(w"~1))7. We have encoded this 
in the matrix equation 


(9.16) Y = F(w) PA, 


and the reverse interpolation process given in (9.13) can be written as 


(9.17) A= - -F(W)PRY . 


So, we see that INTERPOLATION is essentially the same process as EVAL- 
UATION ! This means that the analyses of INTERPOLATION and EVAL- 
UATION are the same. 

To analyze EVALUATION, we first consider the time required to con- 
struct PA. Computing Rev;(j) for any 0 < 7 < 2* = n can be done in 
time proportional to the length k, and computing the permutation Rev, 
takes time O(nlog n). Once Rev, is known, applying it to A amounts to 
swapping pairs of indices (or pointers), and this takes time O(n). Therefore, 
the time for finding X = P,A is O(nlogn).? 

The decomposition of the Fourier matrix F(w) given in Theorem 9.5.1 
suggests a divide-and-conquer strategy for computing Y = F(w)X. To 
XxX 


see this, we write any X € C2 as X = ( X 
2 


) where X, is the vector 


3Many texts ignore the time necessary to compute the permutation, since in practice 
it takes very little time compared to the rest of the algorithm. Even if we had to apply 
a permutation in every iteration, the asymptotic run time is not affected (for this, refer 
to Exercise 9.20). 


9.5 The Fast Fourier Transform 281 


consisting of the first 2*~' components of X, and X>2 contains the last 2-1 
components. The special form of F'(w) gives 


rion = rte) (Za) = (Reaia" betas) 


Noting that the products F(w?)X 1 and F(w?)X» need only be computed 
once, we conclude that this decomposition gives a divide-and-conquer al- 
gorithm for computing F'(w)X. A schematic description is 


permute _ Xy half-size F(w?)X 
yee &) salle ae 


combine F(w?)Xy + DF(w?) Xe -~y 
F(w2)X, — DF(w?)X2) 7’ 


and likewise, INTERPOLATION can be described by the schematic 


conjugate and permute vv _. Xy half-size F(w?)Xy 
Y PY =: & a Gee 
combine F(w?)X + DF(w?)Xe2 
——_—> 
F(w?)X = DF (w?)X2 


conjugate and scale by 1/n 1 
> 


—V(w)¥ =A. 


=V(w)Y 


Note that in this process the matrix F'(w) never needs to be calculated, 
because all Fourier matrices in the scheme are simply recursive calls to a 
procedure. Also, the diagonal entries of D that are powers of w have been 
stored in an array and can be pulled from the array when necessary. 

We’ve already shown that the permutation steps can be done in O(n log n) 
time, and we now compute the run time of the divide-and-conquer part. 
Since the combine step involves multiplying by a diagonal matrix and 
then performing an addition and a subtraction, this step takes O(n) op- 
erations. Because the problem has been divided into two half-size sub- 
problems, the divide-and-conquer recurrence has a = c = 2, and T(n) = 
2T(n/2) + bn, which is the second case of the divide-and-conquer formula 
in Theorem 9.4.1, and so T(n) = O(nlogn). Finally, when n is not a power 
of two, appending zero coefficients just increases the implied constant in 
O(nlogn). This completes the proof that the FFT multiplies two polyno- 
mials of degree at most n — 1 in O(nlogn) operations. 


9.5.4. The fast polynomial multiplication algorithm 


The entire algorithm for computing the coefficients of the product of two 
polynomials is summarized in the following table. 
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FFT Polynomial Multiplication 
Let f,g € C[a] be polynomials with deg(f),deg(g) < n. 
1. Place the coefficients of the polynomial f in the first n components 


of a vector A of length 2*+!, where 2* is the smallest power of 2 
that is at least n. Set the remaining components of X to zero. 


. Place the coefficients of g into a vector B of length 2**1. 


. Permute both A and B, obtaining X and Y respectively. 


. For the principal 2"+1-th root of unity w, use the divide-and-conquer 
algorithm to compute both F(w)X and F(w)Y. 


. Let Z be the componentwise product of these two vectors. 
. Permute and conjugate Z. 
. Use the divide-and-conquer algorithm to compute F'(w)Z. 


. Conjugate each component of the result in Step 7 and divide by 
2*+1. This is the vector of coefficients in the product fg. 


The computation is only approximate, because of the round-off in floating 
point operations and because we can only use finite approximations to 
the powers of w. The first 2n — 1 components of the output vector are 
approximations to the coefficients of the product polynomial. Although all 
higher coefficients in the product f g are zero, some of these components 
may be non-zero in the output. The size of these extra components gives 
some indication of the accuracy of the computation. 

Schematically, the algorithm is 


COEFFICIENTS VALUES VALUES OF COEFFICIENTS 
PRODUCT OF PRODUCT 


X —~ |Evaluate ay 


Componentwise 


_77 | Multiply 


——> | Interpolate | > Z 


Y — |Evaluate 


FIGURE 9.3. Schematic representation of FFT polynomial multiplication. 


Any divide-and-conquer algorithm must specify what is meant by “small”, 
the size of problem that is computed “by hand”. In this algorithm, we could 
take as the “small case” the constant polynomials, compute their FFT us- 
ing no operations, and proceed to the combine step. Other variations are 
possible. Since f(+£1), g(£1), f(#¢), g(42), can all be computed using no 
multiplications, two component vectors or four component vectors can also 
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be conveniently used as the base case. The choice of base case does not 
affect the Big Oh order of the run time, but this choice does affect the 
actual run time by changing the constant hidden by the Big Oh. 

We illustrate the FFT algorithm by applying it to the simple example of 
squaring the polynomial 2 + x. (In the following, we will write the vectors 
as rows, even though the description of our algorithm uses column vectors.) 
In Step 1, we form the vector (2,1,0,0), and Step 2 is the same for this ex- 
ample. In Step 3 we use Revg to permute the vectors and obtain (2,0, 1,0). 
In Step 3 the recursive routine is applied to the two vectors, which we break 
into the two vectors (2,0) and (1,0). Since 


F-1)=| j ar 


premultiplication of (2,0) and (1,0) by F(—1) gives (2,2) and (1,1). We 
next premultiply (1,1) by the diagonal matrix with 1 and i on the diagonal, 
yielding (1,7). We now add and subtract this from (2,2) and place the 
results in a vector with four components and get 


(C149 P18 a (6.8451, oe), 


These numbers are supposed to be the values of the polynomial f(a) = 2+ 
at x = 1, 7, —1, and —7, which indeed they are. 

In Step 5 we multiply the two (in our case identical) “value vectors” 
componentwise. This amounts to squaring each component of the vector 
above, and this yields 


Z = (9,34 4i,1,3 — 44), 


whose coordinates are the values of the polynomial 4+ 42 + 2? at x = 1, 


i, —1, and —7. In Step 6 we permute using Rev2 and conjugate to get the 
vector 
(9,1,3 — 47,34 47). 


In Step 7 we apply the same recursive algorithm, starting at the above 
vector. It splits into (9,1) and (3 — 4i,3 + 42), and these become (10,8) 
and (6, —87) in the base case of the recursion. We multiply (6, —87) by the 
diagonal matrix with 1 and i on its diagonal to get (6,8), and then (10,8) 
and (6,8) are added, subtracted, and placed in the vector 


(10 +6,8+ 8,10 — 6,8 —8) = (16, 16, 4,0). 


In Step 8 we conjugate (which has no effect, since our values at this point 
are real) and divide by 2? = 4 to get the vector 


(4,4,1,0), 


the coefficient vector for f(x)g(x) = (2+ 2)? =44 4a +27. 
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Notice that the FFT multiplies complex polynomials. Some improvement 
is possible if all coefficients in both polynomials are real. One minor change 
is that it is never necessary to conjugate in Step 6. More significantly, since 
the evaluation is done at 2*+! ~ 2n complex points, one can pack a pair 
of real numbers into one complex number and design a real FFT, which 
evaluates at only 2* instead of 2*+! complex points. (For details on this, 
refer to Exercise 9.21 and [2].) 


9.6 Average Case Analysis 


Until now we’ve considered the run time of an algorithm to be a function of 
the input size. In practice, an algorithm might treat all inputs of the same 
size in the same way or it could handle some inputs more quickly than 
others. Because of this, we define the best case, worst case, and average 
case run times for an algorithm. The maximum run time over all inputs of 
the same size is called the worst case run time, and the minimum run 
time over all inputs of the same size is called the best case run time. 
Averaging the run time over all inputs of the same size is called the average 
case run time. When the probability associated with each of the various 
inputs of a particular size is unknown, it is difficult to calculate the average 
case time. For definiteness and simplicity of calculating the average case, 
it’s often assumed that each input of a fixed size is equally likely to occur. 


9.6.1 The LARGETWO algorithm 


Consider the following algorithm, which finds the two largest entries in a 
one-dimensional array and assigns these values to the variables FIRST and 
SEC. 


PROCEDURE LARGETWO(C) 
FIRST:= C[1| 
SEC: = C[2] 
FOR [=2 TO n DO 
IF C[I| > FIRST 


THEN SEC:=FIRST ; FIRST:= C{l] 
ELSE IF C[I] > SEC 
THEN SEC:= C{I] 


Note that this algorithm assumes that the array has at least two compo- 
nents, and so the results are unpredictable when it is used with a one- 
element array. 
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The ground rules for our run time analysis are that we’ll count the num- 
ber of comparisons of array entries and ignore any comparisons of numbers 
used to control the FOR loop. Each time through the FOR loop the al- 
gorithm makes either one or two comparisons, and so for an n-element 
array the algorithm uses at least n — 1 comparisons and at most 2(n — 1) 
comparisons. If the data happened to be arranged in order of increasing 
coordinates, 

C{l] < C[2] <---<C[n], 


the algorithm performs exactly n — 1 comparisons, and this implies that 
the best case run time is B(n) = n— 1. The algorithm makes 2(n — 1) 
comparisons when C1] is the largest entry, and this is the worst case, with 
run time W(n) = 2(n — 1). 

What about the average case? Is it close to worst case, close to best case, 
or midway between the worst and best cases? Let A(n) be the number of 
comparisons used on average by this algorithm. Since the algorithm begins 
with the first entry and proceeds in one direction across the array, we may 
reasonably assume that for the first n — 1 entries the algorithm on average 
uses A(n—1) comparisons, the same number it uses if the last entry weren’t 
there. For the last entry, it uses at least one comparison. If Cn] is not the 
largest entry, then C[n] > FIRST is false, and a second comparison must 
be made. If C[n] ts the largest entry, then C[n] > FIRST is true, and 
only that one comparison is made. By our assumption of uniformity, the 
probability that C[n] is the largest is 1/n, and from these considerations, 
for all n > 2, 

i! 1 1 
(9.18) A(n) = An 1) +2(1- 2) 41-2 = An 1) 42-2, 
When the array has two entries, uniformity implies that the maximum entry 
is equally likely to lie in either entry, and so two comparisons are required 
exactly half of the time, giving A(2) = 3/2. Since (9.18) is a first-order 
recurrence with eigenvalue \ = 1 and forcing function w(7) = 2 — 1/1, the 
solution is 
n n 


A(n) = > (2- =) =2(n-1)- = for alln >2. 


1=2 1=2 


Using the result on the partial sums of the harmonic series given in Ex- 
ercise 8.39, A(n) is therefore asymptotically approximated by 2(n — 1) — 
In(n) +c, where c is some constant. From this we see that the average case 
behavior of the algorithm is very close to the worst case, W(n) = 2(n— 1), 
especially as compared with the best case run time, B(n) = n — 1. Also, 
notice that the average of the worst and best case run times is $(n — 1), a 
severe underestimate of the actual average case behavior. 

Here’s another analysis of the average case run time for LARGETWO. 
Let Clj] be the maximal entry. From our assumption of uniformity, the 
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probability that Cj] is the largest entry is 1/n. As noted above, 2(n — 1) 
comparisons are made for j = 1. For j > 2 the algorithm performs A(j— 1) 
comparisons before the loop reaches J = 7, then for J = 7 the loop makes 
one comparison, and for each of J = 7+1,...,n two comparisons are used. 
This gives 


n 


nA(n) = S [AG - Ij 1+9—9))., 


j=2 
Expanding the sum gives 


= AG) +n-14(n-1)n 


Therefore, for all n > 2, 


n—1 


(9.19) nA(n) = >— Aj) +n? -1, 


j=1 


a recurrence that depends on all previous terms. Similarly to some re- 
currences in Chapter 4, its solution can be shown to satisfy a first-order 
recurrence. Since 


n—2 


(n—1)A(n—1) = $5 A(j) + (n-1)? -1, 


j=l 
subtraction of equations gives 
nA(n) — (n—1)A(n —1) = A(n-1) +n? — (n—-1)? = A(n—1) + 2n-1; 
nA(n) =nA(n—1)+2n-1, 


which is the same recurrence found earlier in (9.18). 


9.6.2. The QUICKSORT algorithm 


Next we consider QUICKSORT, an algorithm whose average case analysis 
is more complicated. The algorithm sorts an array S of numbers according 
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to increasing size (and we allow repeated entries in S). The output of the 
algorithm is a sorted string of numbers, and the dots in the RETURN 
statement indicate concatenation of strings. 


PROCEDURE QUICKSORT(S) 
Pick an entry a of S at random. 
Divide S into three parts, 5S,52,53, where 
S; is the set of entries of S that are less than a. 


So is the set of entries of S that equal a. 
S3 is the set of entries of S that are greater than a. 
RETURN (QUICKSORT (51) - S2- QUICKSORT ($3) ) 


This is arandom algorithm, because it uses a random choice at some point, 
in this case for the choice of the entry a. The choice could be randomized 
by using a random number generator, but usually a rule is used in the hope 
that the distribution of the input data is not significantly correlated with 
the choice of rule. We'll assume that this correlation is low enough to allow 
us to treat the a’s generated by the rule to be truly random. Three popular 
rules for choosing a are: always choose the first entry; always choose the 
last entry; always choose the middle entry. Another popular method is to 
choose the median of the first, last, and middle entries. 
The run time of this algorithm on an array S of size n satisfies 


T(S) = 7T(S1) + bn + T(S3), 


where 6 accounts for the time used to place entries into the three sets and 
return the answer. The worst case occurs when 5; or $3 contains n — 1 
entries and 
W(n) =W(n-1)+ bn, 
giving 
- b 
W(n) = ie =1+ 5n(n+1) = O(n). 
The best case for QUICKSORT occurs when 5S; and $3 each contain ap- 
proximately n/2 entries. Then 


B(n) = 2B(n/2) + cn, 


for some constant c. This is a divide-and-conquer recurrence with a = c = 2 
and m = 1, and B(n) = O(nlog n). 

So, there is a large gap between the worst case, W(n) = O(n?), and the 
best case, B(n) = O(n log n). Does the algorithm on average behave closer 
to worst case or closer to best case? Notice that if the average case run time 
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were the numerical average of worst case and best case, then the average 
would be @(n”) and therefore close to the worst case. We show that the 
opposite occurs, and find that on average, the run time of QUICKSORT is 
O(n log n) and hence behaves close to best case. This is intuitively plausible 
because nearly even splits should on average occur much more frequently 
than one-sided splits. 

As usual, for n > 1 let A(n) be the average case run time for an array 
with n entries and assume that a is equally likely to be any of the entries. 
Letting A(0) = 0, for n > 1, 


= 
= 
lI 
slr 
M: 
= 
aD 
= 
= 
= 
= 
= 
3 
ws 


j=l 


where b(n +1) is the time for split and combine (and is written in this form 
because it slightly simplifies the computation.) Therefore, 


nA(n) = AG - 1) +>) Al = j) + b(n +1) = 257 AY) +on(n+1), 


Using the technique of the previous example, 
nA(n) — (n—1)A(n — 1) = 2A(n — 1) + 20n, 
giving 
nA(n) = (n+ 1)A(n — 1) + 2bn; 
A(n)  A(n—1) - 2b 


n+l n n+1- 


Setting Z(n) = A(n)/(n + 1), this becomes 


2b 


An = Be 


a first-order recurrence with forcing function (j) = 2b/(j + 1), giving 
Z(n) =e. + 25> , 
1 27¥1 


and 
n 


A(n) = co (n+1)4+ 2d(n+1 —., 
(n) = exfn41) +2841) 
for some constant c;. Using Exercise 8.39 again gives A(n) = O(n log(n)). 
This analysis shows that the average case behavior of QUICKSORT is 


close to the best case behavior, and suggests that QUICKSORT may be a 
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good method to use to sort an array. However, remember that the analysis 
assumes that each permutation of the input behavior is equally likely. If 
that assumption were not true, then it isn’t clear that QUICKSORT is a 
good algorithm to use. For instance, what if the data were in order except 
for a few items? If you always choose the first input for a, then the splits 
usually give one very large set, and it can be proved that the average run 
time of QUICKSORT for this type of data has order O(n?). 

A lesson to be learned from these two examples is that you know that 
the average case is somewhere between worst case and best case, but it 
can be anywhere between worst and best. Further, in many practical cases 
you'll have no idea about the distribution of the input data. The assump- 
tion of a uniform probability distribution often simplifies the average case 
calculation, but it is probably not applicable to a general input distribution. 


9.7 Exercises 


Ex 9.1. Use our perspective of moving the largest disk to derive a recur- 
rence for the minimal number of moves needed to solve the n-disk Towers 
of Hanoi puzzle. Solve your recurrence with appropriate initial conditions 
and conclude that any algorithm that solves the n-disk Towers of Hanoi 
puzzle has run time 2(2”). (Refer to Chapter 1 for the meaning of 1.) 


Ex 9.2. Give an algorithm for the Towers of Hanoi puzzle that uses more 
than the minimal number of moves. Derive and solve a recurrence for your 
algorithm and show that it uses more moves than the minimum counted 
in the previous exercise. (If you don’t see the usefulness of constructing 
algorithms that take longer than necessary, consider the problem of The 
Cab Driver and the Tourist.) 


Ex 9.3. In the following algorithm, the towers are labeled 0, 1, and 2 rather 
than A, B, and C. The variable COUNT contains n bits numbered from 1 to 
n starting at the rightmost bit. The positions in COUNT alternate between 
odd and even (in that order). 
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PROCEDURE TOWERS (n) 
T:=0 (*Tower number computed modulo 3*) 
COUNT : =0 
reaf tI if n is even 
—l if n is odd 


WHILE true DO 
Move Disk 1 from T to T+P 


T := T+P 

COUNT := COUNT+1 

IF COUNT = all 1’s THEN exit 

IF Rightmost 0 in COUNT is in an even position 
THEN move disk from T-P to T+P 
ELSE move disk from T+P to T-P 

COUNT := COUNT+1 

ENDWHILE 


(a) Without using recurrences show that this algorithm uses the minimal 
number of moves. 

(b) Even if incrementing COUNT and finding the rightmost 0 takes time 
cn for constant c, show that the run time of this algorithm is 0(2”). 


Ex 9.4. (Refer to [19].) The following algorithm for the Towers of Hanoi 
problem has the towers arranged in a circle. By considering the situation 
in which the largest disk is moved, derive and solve a recurrence for the 
number of moves made and a recurrence for the run time of this algorithm. 


PROCEDURE 
Move smallest disk one tower clockwise 
WHILE a disk other than the smallest disk can be moved 


DO move that disk 
move the smallest disk one tower clockwise 
ENDWHILE 


(Notice that if the number n of disks is even, then the disks are moved one 
tower counterclockwise, while if n is odd, the disks are moved one tower 
clockwise.) 


Ex 9.5. Which of the three algorithms for the Towers of Hanoi puzzle 
is fastest? The recursive algorithm and the algorithms from the last two 
exercises all have run time @(2”). To determine which is really fastest, we 
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suggest programming them and running them on a real computer. (You 
may want to make a small modification in the algorithm from the last 
exercise, always moving the disks to the same tower regardless of the parity 
of n.) Can you explain why one algorithm might be slower than the others 
in a real-world programming environment? 


Ex 9.6. For a given one-dimensional array A, let A[] denote the content of 
the i‘ entry of A. Show that the following procedure has run time O(n) by 
deriving and solving the appropriate recurrence. Describe in English what 
the algorithm accomplishes. 


PROCEDURE LARGEST(n) 
IF n> 1 THEN LARGEST(n— 1) 
IF Aln—1] > Al[n] 


THEN TEMP:= A[n] 
Aln]:= Aln— 1] 
Aln —1]:= TEMP 


Ex 9.7. Using the result of the last exercise, give an inductive proof that 
the following algorithm sorts (A[1], A[2],...,Al[n]). Derive and solve a re- 
currence for the run time. 


PROCEDURE SORT(n) 
IF n> 1 THEN LARGEST (n) 


SORT (n — 1) 


Ex 9.8. According to an old tale told by Edouard Lucas, the monks in 
a secret monastery in Hanoi are performing the moves for the Towers of 
Hanoi puzzle with 64 disks. When they finish, the world will end. If it takes 
them one minute to move a disk, should you worry about the end of the 
world? What if they can move one disk per second? After a wily computer 
salesperson convinces the monks to automate so that they can simulate 
moving a disk in one nanosecond, should you worry? (Also refer to the title 
story in The Nine Billion Names of God: The Best Short Stories of Arthur 
C. Clarke, Harcourt, Brace & World, New York, 1967.) 


Ex 9.9. Let us (mis)construe the Towers of Hanoi to mean that an order 
of disks on a peg is acceptable as long as the disk on the bottom is the 
largest. Give an algorithm that moves a single stack of n disks from one 
peg to another in a minimum number of moves, and find a formula for that 
minimum number of moves. The disks are sorted from largest on bottom 
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to smallest on top at the start, and are to be sorted in the same order, but 
on a different peg at the end (refer to [111]). 

The following secure locking system is used in the next four exercises. 
The locking system assumes that n locks are connected so that 


1. Lock 1 may be changed from locked to unlocked or from unlocked 
to locked at any time. 


2. For any j > 1, lock j may be changed from locked to unlocked (or 
vice versa) only if lock 7 — 1 is locked and locks 1 through 7 — 2 are 
unlocked. 


Ex 9.10. One strategy for designing such a system involves thinking of 
unlocking all the locks in terms of unlocking a subset of locks, unlocking 
the last lock, re-locking the subset, and repeating this process until all locks 
have been unlocked. This strategy naturally leads to a design in which you 
have two mutually recursive procedures (or one recursive procedure with a 
switch that indicates whether the procedure is locking or unlocking.) 

(a) Use this strategy to design a recursive algorithm that gives the se- 
quence of operations needed to unlock a series of n locks that are all 
initially locked. 

(b) Find a pair of recurrences that give the time and number of lock- 
ing/unlocking operations used by your algorithm. 

(c) Solve the recurrences in part (b). 


Ex 9.11. A second strategy rests on a fairly simple observation. The locks 
are in a specific configuration right after the last lock is unlocked, and to 
unlock the next-to-last lock, it is necessary to re-configure the locks. You 
can design a simple recursive procedure that takes the locks from the first 
configuration to the second configuration. This strategy leads to a design 
with two nested recursive procedures. 

(a) Use this strategy to design a recursive algorithm that gives the se- 
quence of operations needed to unlock a series of n locks that are all 
initially locked. 

(b) Find the pair of recurrences that give the run time and number of 
locking/unlocking operations used by your algorithm. 

(c) To solve the recurrences in part (b), observe that the first depends 
on the second, but the second does not depend on the first. Thus, 
you can solve the second equation and plug its solution into the first 
equation and solve. 


Ex 9.12. Note that the procedures in the previous two exercises use the 
same amount of time and number of moves because they do the same 
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operations in reverse order. Show that this algorithm uses the minimal 
number of locking/unlocking operations. 


Ex 9.13. To get a feel for the puzzle you've investigated in the last prob- 
lems, go to your local toy store and get SPIN-OUT®). This is a physical 
embodiment of the puzzle with seven locks. Amaze your friends by solving 
SPIN-OUT® in the minimum number of moves. 


Ex 9.14. Give a divide-and-conquer algorithm for finding the largest entry 
in a vector with n components and show that your algorithm uses n — 1 
comparisons. 


Ex 9.15. Design a divide-and-conquer algorithm to find the two largest 
entries in an array. Show that your algorithm is correct and calculate the 
number of comparisons it uses. Give an example that shows that your 
algorithm uses more comparisons than necessary. 


Ex 9.16. Suppose you have a large number of coins and a two-pan balance 
whose pans are as large as needed. The balance tells you whether the coins 
in one pan weigh the same as the coins in the other pan or which set of 
coins is heavier. Among your coins is exactly one that has a different weight 
from the other coins. 
(a) Assuming that the number of coins is a power of 3, design a divide- 
and-conquer algorithm to find the odd coin. 
(b) Prove that your algorithm is correct. 
(c) Find and solve a recurrence for the number of times your algorithm 
uses the balance. 


Ex 9.17. Two finite strings C;, and C2 are said to commute if juxtapos- 
ing in either order gives the same string (that is, C1C2 = C2C1). Give a 
constructive proof that C, and C2 commute iff there is a string w and two 
natural numbers ky, and ky such that Cy = w*! and C2 = w*? (where w* 
denotes the juxtaposition of k copies of w.) Give a constructive proof that 
such a w exists. Use your proof to construct an algorithm that finds w for 
two commuting strings. 


Ex 9.18. Show how a divide-and-conquer strategy with the clever identity 
(9.4) can be used to construct an algorithm for multiplying large integers 
in run time O(n!°823), where n is the number of bits in the larger factor. 


Ex 9.19. Show that the power 


can be computed in O(n) bit operations. Which special properties of this 
matrix allow this “fast” algorithm? 
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Ex 9.20. Consider a recurrence of the form T(n) = 2T(n/2) + g(n), where 
g(n) is a nonnegative function. Argue that the asymptotic order of growth 
of Tn) is monotonic in g(n) in the following sense: If h(n) = O(g(n)) and 
S(n) = 2S(n/2) + h(n), then S(n) = O(T(n)). Then, using the technique 
of letting t, = T(2*), show that T(n) = O(nlogn) iff g(n) = O(nlogn). 


Ex 9.21. Set up and solve a recurrence for the number of real multipli- 
cations the FFT algorithm uses to multiply two real polynomials with n 
coefficients. 


Ex 9.22. (a) The number of multiplications used by the FFT depends 
on the size of the agreed upon “small-size” problem. Work out the 
difference in the number of multiplications among FFTs recursing to 
one component; two components; four components. 

(b) Compare your answers in part (a) with the n? multiplications used 
by the standard algorithm for polynomial multiplication. 

(c) Calculate the value of no for which the FFT is faster than the stan- 
dard method for polynomials with n > no coefficients. Do you think 
that the FFT is useful in practice? If you had to multiply two poly- 
nomials with 2!° coefficients, which method would be faster? 

(d) Assume that the time for all operations used is dominated by the 
time for multiplication. What is your prediction for the ratio of the 
run times of the FFT and the standard algorithm? 


Ex 9.23. Do an average case analysis for the following procedure: 


PROCEDURE BIGTWO(C) 
FIRST :=C[1] 
SEC:=C [2] 

FOR I:=2 TO n DO 
IF C[I]> SEC 


THEN IF C[I] > FIRST 
THEN SEC:=FIRST 
FIRST :=C[I] 
ELSE SEC:=C[I] 


Is the average case run time for this algorithm nearer to its worst case or its 
best case? Which of the two procedures LARGETWO(C) or BIGTWO(C) 
would you use? 
Ex 9.24. If the best case B(n) of an algorithm satisfies 

B(=-7) + B(= +7) +cn < B(n) < B(=-a) + B(= +a) + bn 
for constants a, 7, b,c, show that B(n) = O(nlog n). 
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Ex 9.25. Show that if the matrix C is initialized to the zero matrix, then 
the following algorithm computes the n x n matrix product C= Ax B. 


1 TO n DO 


Cig = Cij + Qik * br; 


Compute the run time of this algorithm. Explain why most computations 
use this algorithm rather than Strassen’s algorithm from Section 9.4.2. 


Ex 9.26. Show that if multiplication of two n-bit numbers can be per- 
formed in time O(n'**) for some € > 0, then division of two n-bit numbers 
can also be carried out in time O(n1**). (Refer to Section 9.4.4.) 


Ex 9.27. Show that multiplication of two n-bit numbers can be performed 
in the same time order as division of two n-bit numbers. 


Ex 9.28. Show that the square root of an n-bit number can be performed 
in the same time order as multiplication of two n-bit numbers. 


Ex 9.29. Compute the product of the Fourier matrices F'(i) and F(—i) to 
show that F(w)~! 4 F(@). Find a simple formula for F(w)~', the inverse 
Fourier matrix. 


10 


Some Nonlinear Recurrences 


10.1 Some Examples 


In previous chapters we have primarily discussed linear recurrences, or, 
said another way, only recurrences involving linear operators on the space 
of sequences. Recall that a function L on the space of sequences is a linear 
operator if it satisfies the following two conditions: 


1. Lia + y| = Lia] + Ly), 
2. L[{ca] = cL [2], 


for all sequences « = (#,) and y = (y,) and all constants c. (Notice that 
a square matrix is a linear operator on the space of vectors of appropriate 
size, since a matrix satisfies these conditions when x and y are any vectors 
and c is any scalar.) While the operation + above is our usual addition 
of sequences, one could consider replacing + by other operations. We do 
not follow this tack, but instead look at truly nonlinear equations. To keep 
things simple, we consider one-dimensional equations of the form 


Tui = f(£t). 


For linear equations there is essentially only one linear one-dimensional 
equation, but to paraphrase Tolstoy, equations can be linear in only one 
way, but equations can be nonlinear in many different ways. So we should 
not expect to have a general theory for nonlinear equations. Rather, we 
hope to have different theories for different classes of nonlinear equations. 
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Let us consider the very simple nonlinear example 


1 
Tt4+1=—. 
Lt 
If ao = 2, then 2; = 1/2, v2 = 2, x3 = 1/2, and the sequence oscillates 
with period 2. For general non-zero xo, the sequence is %9,1/29,2o,1/20, 
and the sequence generally has period 2. The only exceptions are the fixed 
points zp = +1 and wp = 0. (Often we extend the reals to include oo, and 
the undefined 1/0 is taken to be 1/0 = oo and 1/oo = 0. Then 0 is also 
a periodic point with period 2 oscillation.) Even though this equation is 
nonlinear, from this analysis we can guess that in some sense it is analogous 
to the linear equation 


Yn4+t1 = —Yn, 


which generally has period 2, and 0 is the only fixed point. We continue 
this analogy with linear equations in Section 10.6. 
As another simple example, consider 


Tm+1= Vn. 


For this equation to make sense, we assume that zp > 0 and that /z 
returns the nonnegative square root of x. We can calculate some iterates, 


1/2 
TM=%o ; 

1/2 1/2\1/2 1/4 
r2 = x1'" = (xo ye Sag ’ 
_ 1/2 _ 7,,1/4)1/2 _ ,,1/8 
fg3=2,° =(a@, ) i" = 2°, 


and see that the solution is 


1/2” 
Ln =I . 


There are four cases: 
1. x) = 0, and then x, = 0 for all n > 0; 
2.0<a9 <1, and then 1 > rp41 > @, > 0; 
3. 9 > 1, and then 1 < %y41 < 2n; 
4. 9 = 1, and then x, = 1 for all n > 0. 


We summarize these cases by saying that 0 and 1 are fixed points of the 
system, since f(p) = p for p= 0,1. The fixed point 0 is unstable, while 1 
is a stable fixed point that attracts all solutions with xp > 0. 

Our point, so far, is that some nonlinear equations are not difficult to 
analyze. But there are nonlinear equations that are more difficult. Consider 
the example 

Ln4+1 = asin(ay) , 
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which can schematically be written as 
tn = asin(asin(...(#o)...)), 


a formula that gives no clear information about the behavior of the solu- 
tions. Some things are easy to see. For example, if xo is an integer multiple 
of m7, then x, = 0 and z, = O for all n > O. On the other hand, it is 
not clear what sort of solutions arise for other values of xp. In particular, 
what happens if xp is a small positive number? Is the solution attracted 
to 0? Does the solution converge to some positive value? Does the solution 
become periodic? In general, these questions are difficult to answer. 


10.2 Nonlinear Systems 


The preceding examples suggest that nonlinear difference equations may 
be difficult to analyze. In the remainder of this chapter we want to de- 
scribe and give examples of some commonly used methods for analyzing 
these equations. Of course, in this relatively short chapter we cannot cover 
all techniques, and so we refer the interested reader to LaSalle [92], De- 
vaney [51], and Parker and Chua [125], which include more information. 
To keep things simple we concentrate on the one-dimensional equation 


Le41 = f (xt), 


where f(x) is a nonlinear function and z is a real variable, which may be 
restricted to the positive reals or to the extended reals or to some interval 
on the real line. Such equations are sometimes called discrete dynami- 
cal systems, because they are the discrete analogs of dynamical systems 
occurring in physics. The function f(x) is sometimes called a map to em- 
phasize that this is a discrete system. Notice that these equations have zero 
input, and so the complexity comes from the nonlinear behavior of f(z) 
rather than from an external source. Some complexity can come from the 
choice of initial condition. 

The simplest way to deal with such equations is computation. If f(x) isa 
reasonable function, then one should be able to write a computer program 
that, when given an initial value xp, can compute as many values of the 
sequence 71, %2,... as one desires. There is the usual caveat that computers 
do not compute with real real numbers, but instead they compute with 
approximations to real numbers. So a sequence generated by a computer 
may not be the sequence actually defined by the nonlinear recurrence. Of 
course, for reasonable functions one hopes that the real and the computed 
sequences are similar. In some cases this can be proved, but we do not 
tackle this problem of approximation here. 

Since humans tend to understand pictures better than sequences of num- 
bers, these computer calculations are often presented as graphs. (These are 


300 10. Some Nonlinear Recurrences 


not the same kind of graphs we used in discussing nonnegative matrices.) 
The most straightforward graph is a plot of x; as a function of the natural 
number t. This is often called a time plot. Figure 10.1 shows a time plot 
of the quadratic difference equation 


Tt41 = x41 + r(1 = xt)| 


with r = 2.99 and xp = .35. To make this plot look more like a continuous 
function, it is not plotted as a sequence of points, but rather each point 
(t, xv) is connected by a straight line to the next point (t+ 1,241). The 
time plot shows that the elements of the sequence are positive and don’t 
get too big, but that they do jump around in an irregular manner. 


FIGURE 10.1. A time plot of a chaotic trajectory (quadratic model with 
r = 2.99). 


Another plotting technique may be more revealing. This technique is 
called a web plot, and it is somewhat similar to the phase plane plot used 
in differential equations. The idea is to plot x41 as a function of x,. While 
this does give some information, it does not show a solution sequence. To 
follow a solution it is useful to have both the curve f(a) and the line y = x 
plotted on the same axes. To follow a solution: 


1. Start with (xo, f(xo)), which is a point on the curve. 


2. Then draw a horizontal line from this point to the line y = x. The 
point of intersection on the line is (f (ao), f(xo)), ie., (1, 21). 
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3. Then draw a vertical line from this point to the curve. The point of 
intersection on the curve is (21, f(#1)), ie., (v1, 22). 


4. Now simply repeat the steps going from curve to line and line to 
curve for as many steps as desired. 


FIGURE 10.2. A web plot showing chaos (quadratic model with r = 2.99). 


Figure 10.2 shows the web plot that corresponds to the time plot shown 
in Figure 10.1. This web plot may be more informative than the time plot. 
For example, the parabola a[1 + 7(1—<)] is outlined in the web plot, which 
suggests that many or most values of x are visited on the trajectory, but 
this plot also suggests that some regions are more frequently visited than 
others. Further, the plot suggests that if the trajectory is periodic, then it 
has a very very long period. Also, the fixed point « = 1 is visible, and the 
plot suggests that this fixed point tends to repel nearby trajectories. 

A full understanding of a nonlinear system might require computing so- 
lutions for essentially all initial conditions. Since this is infeasible, we would 
like some simple ways to summarize the possible behaviors. For arbitrary 
f(«)’s this is not possible, since there are functions that are complicated 
enough to simulate the behavior of Turing machines. For such functions 
there can be no algorithm that determines whether a given initial condition 
is eventually periodic [108], but the functions used in various applications 
tend to be simple enough or have enough reasonable properties to allow 
some sort of analysis. In particular, the assumption that f(x) is continuous 
and sufficiently differentiable is usually enough to obtain some results. 
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We often focus on results that are invariant or asymptotic. A property 
is invariant if it remains the same along a trajectory. The most common 
example of this is a closed physical system in which the total energy is 
always the same. Even as the system’s variables change, the total energy 
remains invariant. Asymptotic properties are those that hold in the 
limit as the time ¢ increases. For example, a system might approach a 
“steady state” as time increases. The behavior of a nonlinear system is 
usually analyzed in terms of the system’s fixed points and cycles. The next 
definitions define some of these behaviors. 

As in previous chapters, the iterates of f are defined recursively by 
f(x) =x and f(x) = f( f(z). A string of distinct points p,,...,px 
forms a cycle of length K, which we call a K-cycle if f(*)(p;) = p; 
for all 1 <i < K. The points of a K-cycle are called periodic of pe- 
riod Kk. A fixed point p is called attractive if there is a sufficiently 
small neighborhood of p such that for all points q in this neighborhood, 
limn—oo f((q) = p. Similarly, a cycle p;,...,pK is an attractive cycle 
if there is a neighborhood of the cycle such that for all points q in this 
neighborhood, limn—soo f("*)(q) is in the set {p1,...,pK}. On the other 
hand, a fixed point (or cycle) is repelling if there is a neighborhood of 
the point (or cycle) such that for all points g in this neighborhood, there 
is an n such that f‘”)(q) is outside the neighborhood. 

In the following sections we look at some examples of nonlinear systems 
and see how fixed points and cycles can be shown to be attractive. 


10.2.1 Sarkovskii’s Theorem 


Nonlinear systems may have many different co-existing cycle lengths, for 
example, x9, f (xo), f'?)(a9) may all be distinct and f) (x9) = xo, but yo = 
f(Yo), and then there would be a cycle of length 3 and a cycle of length 1. 
Some common useful functions are the interval maps, which are functions 
defined on a bounded interval [a, b] whose values are also in [a, b]. For the 
special case of continuous one-dimensional interval maps, the question of 
which size cycles can occur is answered by the following theorem. There is 
no analogous result for discontinuous maps or maps in higher dimensions. 
While the theorem dates from 1964, it was not known in the United States 
until many years later. In fact, for example, Li and Yorke [97] proved a 
special case in the 1970’s, and Cull [32] and Rosenkranz [140] also proved 
special cases in the 1980’s. The proof of the theorem is beyond the scope 
of this book, but it can be found in [51]. 


Theorem 10.2.1 (Sarkovskii’s Theorem [144]). If f(x) is a continuous 
one-dimensional interval map that has a K-cycle, then f(x) also has cycles 
of every length less than K, where “less” is defined by the following linear 
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ordering: 

OOOO eg ge as ar 
S20 oe See ee ede Sx 
SOO eR ekpeu BS cau 
SD 8 eS cee S sen 
SO tle >2?>2>1. 


Note that there are two extreme cases: when f(a) has a 3-cycle, then it has 
cycles of every length, and if f(a) has any cycles other than fixed points, 
then it has a 2-cycle. 


10.3. Chaos 


Since the ground-breaking papers of Li and Yorke [97] and May [109, 110] 
in the mid-1970’s, the importance of chaos in science has been evident. A 
variety of different meanings have been attached to the word chaos. For 
some, it simply means very complicated-looking behavior (refer to Fig- 
ures 10.1 and 10.2), while for others, the butterfly effect is the signature 
of chaos. In poetic terms, the butterfly effect means that the flapping of a 
butterfly’s wings in Borneo may cause a tropical storm in the Caribbean 
that devastates the sugar crop and leads to the downfall of Communism. In 
the original use of Li and Yorke [97], chaos meant the co-existence of cycles 
of every length. Other properties that some people like are that chaotic tra- 
jectories come near to every point in the space and that there is a measure 
that is invariant along trajectories. Detailed consideration of these issues 
are beyond the scope of this book, but we consider some nonlinear systems 
that have some of these properties 


10.8.1 A simple chaotic system 


Several properties of chaos are: 

(a) cycles of every length; 

(b) sensitive dependence on initial conditions; 

(c) the existence of bounded but aperiodic orbits; 

(d) for every open set A and every open set B there is an a € A anda 

K €N such that f(*) (29) € B. 

In this section we consider a very simple example that displays all these 

chaotic properties. In order to define the system properly, we first need to 


304 10. Some Nonlinear Recurrences 


discuss the term equivalence mod 1. Two real numbers x and y are said 
to be equivalent mod 1 if «—y is an (integer) multiple of 1; that is, z and y 
differ by an integer. We symbolize this relationship by x = y mod 1, but in 
the following we will replace = by = and write x = y mod 1 . For example, 
3.14 = .14 mod 1. Equivalence mod 1 can be visualized as a clock face 
with circumference 1 that has 0 in the top position, and all the numbers 
between 0 and 1 are in their standard positions around the dial. Notice 
that 1 does not appear, because 1 mod 1 = 0. That is, if you wrapped a 
string of length 1 around the clock face, both the beginning and the end of 
the string would be at 0. If your string had length 3/2, wrapping it around 
the clock face puts the end at 1/2, which agrees with 3/2 mod 1 = 1/2. 
Let us consider some initial conditions for the system 


(10.1) Li41 = 2a, mod 1. 


Clearly, 0 is a fixed point, and some points are attracted by 0. For example, 
if zo = 1/2, then 2; = 2(1/2) mod 1 = 0. In fact, for any integer k, 1/2" 
goes through 1/2*,1/2*-1,...,1/2,0. This is rather tame behavior. We can 
also find periodic behavior. Let’s try v9 = 2/3. Then x, = 2(2/3) mod 1 = 
1/3, and 2 = 2/3, and we see that the trajectory of x) = 2/3 is a 2-cycle. 
More generally, for any k the initial condition a = 2*~1/(2* — 1) gives a 
k-cycle, since 


oF ol 
a oe 1" 
2 
Te 
gk-1 
ae) a em 


This behavior can be seen more readily in binary notation. For example, 
multiplying x = 2 = .10101010... by 2 and then reducing modulo 1 
corresponds to shifting the binary point one place to the right and dropping 
any bit to the left of the binary point. So 2(2) mod 1 is computed by 
taking .10101010..., shifting the binary point to get 1.0101010..., and 
dropping the 1 to the left of the binary point to get .0101010..., which is 
t+at+at =F +44+¢H---). Taking the sum of the geometric 
series, we get q(t) = 3. Similarly, starting with .0101010..., shifting 
the binary point, and dropping the leading 0, we get .10101010..., which 
is the number 2 we started with two steps ago. Here we can see that the 
period 2 of the solution to the difference equation follows from the period 
2 of the binary expansion of 2. 

Sensitive dependence on initial conditions is also easy to demon- 
strate for this equation. Since 0 is a fixed point of the equation, when 0 
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is chosen for the initial condition, the solution remains at 0 forever after. 
For a very small positive initial condition 9 = € > 0, the trajectory begins 
at €, goes to 2e, to 4e, and so forth. If € is very small, then 2€ and 4e are 
also very small, and so for some number of steps the two solutions—the 
one with initial condition 0 and the one with initial condition e—are both 
nearly 0, and we don’t see any divergence. But since € is not zero, it has 
at least one 1 in its binary expansion; say the first 1 appears as the k*® 
bit. Then after k — 1 steps the 1 will have moved to the first bit position, 
which means that z,_1 > 1/2, and the trajectory is now significantly far 
from the zero trajectory. 

Another way to describe sensitive dependence is in terms of information. 
How many bits of the initial condition are needed to accurately predict x,,? 
For our example equation (10.1) we need about k bits to predict x,. This 
means that if we could only measure the initial condition to 10 correct bits, 
we would be at a loss to predict the value of the solution after 11 steps. 
Sometimes this is explained as a loss of bits. If we know the initial condition 
to 10 correct bits and lose a bit of accuracy at each step, then we have no 
bits of accuracy left to predict the 11* value. 

While our example is extreme for showing sensitive dependence, such 
dependence can arise in more realistic situations. Consider predicting the 
weather. One might measure a number of variables like pressure, tempera- 
ture, and wind direction to 10-bit accuracy (about one part in a thousand). 
Now assume that we are calculating the weather with a time step of 1/1000 
of a day. To compute tomorrow’s weather we need to calculate about 1000 
steps. If the weather conditions are smooth, we don’t lose many bits per 
step. When we lose 1/1000 bits per step in smooth conditions, we lose about 
1 bit in tomorrow’s forecast, and we expect our prediction to be accurate 
to about 9 bits. On the other hand, if in turbulent conditions we lose about 
1/100 bits per step, then in 1000 steps we will have lost about 10 bits, and 
so there would be no bits of accuracy left to make tomorrow’s prediction. 
This is not just a story, since some of the original work on chaos was an 
attempt to describe why weather prediction is so difficult [100]. 

Returning to exhibiting sensitive dependence on initial conditions, let 
us note that there was nothing really special about looking at 0. For any 
two different but close initial conditions xo and 29 + ¢, the k*" term of 
the first sequence differs by about 1/2 from the k'® term of the second for 
k = — loge. 

Another characteristic of chaos is the existence of bounded but ape- 
riodic orbits. In our example, all orbits are bounded, since they are lim- 
ited to [0,1). Now we want to determine an initial condition xp such that 
X0,21,%2,... is an aperiodic sequence. The interpretation of the equation 
as a shift on binary sequences makes this easy, because all we need to do is 
find an aperiodic binary sequence. For example, x9 = .10100100010000... 
(designed so the 1’s are isolated and the number of separating 0’s increases 
by one) is aperiodic because the number of 0’s between consecutive 1’s is 
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increasing. Using zo as the initial condition, the solution starts above 1/2, 
drops to a lower value, goes above 1/2, drops to a lower value, builds up for 
2 steps, goes above 1/2, drops to a lower value, builds up for a few steps, 
goes above 1/2, and so forth. Why do we write this initial condition as a 
binary sequence rather than as a ratio? Because no such ratio exists. That 
is, the number we have written is not a rational number, because it has an 
aperiodic binary expansion. 

Next, for any two open sets A and B in [0,1) we want to show that 
there is an initial condition in A such that the trajectory visits B. Because 
A is open, A contains an open interval of the form (a — 2~4,a + 274). 
Let b € B and we will construct an element in A’s open interval that is 
eventually mapped to b. (Notice that this is an even stronger condition than 
required.) Let c = .a;a2...a,; be the rational number formed from the first 
j bits in the binary expansion of a. Let K = j +1 and a =c+27*b. We 
see that x9 € A because a— 2-) < c< a, and by design, f{*)(29) = b. 

A similar construction yields a trajectory that visits every open set. (In 
what follows we refer to a open interval with two rational endpoints as a 
“rational interval.” ) Since every open set contains a rational interval, a tra- 
jectory can be shown to visit every open set by showing that it visits every 
rational interval. The rational numbers are countable, which means that 
they can be put into one-to-one correspondence with the natural numbers. 
There are many such correspondences, and any one of them gives an order- 
ing of the rationals, and so we can speak of the first rational, the second 
rational, and so forth. But since we have just mapped the rationals to the 
natural numbers, we can map the upper and lower endpoints for a rational 
interval (and so the interval itself) to a pair of natural numbers. Hence, 
the set of rational intervals can be countably ordered. Since every rational 
interval contains a rational r with a terminating binary expansion, there 
is a natural number j such that (r — 2~J,r + 27“) is inside the interval. If 
our trajectory visits each of these special intervals, it will visit all rational 
intervals and hence all open sets. 

We are now ready to pick an initial condition x9 whose trajectory visits 
every open set. For the sequence (r;) of terminating rationals (ordered 
according to our ordering of rational intervals) define the initial condition 
xq by the binary expansion 


LoS aT 7 (1 + 1)0’s T2 (jo + 1)0’s T3 (is + 1)0’s 


(where (j; + 1)0’s is our abbreviation for 7; + 1 consecutive 0’s.) By con- 
struction, xo lies in the first interval (r1,r; + 27"); after some number of 
iterations, x, lies in the second interval; and after an appropriate number 
of iterations, there is an x, that lies in the interval (rj,r; + 277‘). This 
means that the trajectory starting at this x9 eventually visits every open 
set. With this we see that the system given in (10.1) satisfies all four of 
our given properties of chaos. 
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10.4 Local Stability 


10.4.1 Local stability of a fixed point 


For a fixed point x 9 we would like to know whether nearby values of x 
are attracted to it. In order to make this idea more precise, we say that a 
fixed point xo of f(a) is locally stable if for every y in every small enough 
neighborhood of ao, f(y) is in the neighborhood; that is, 


If(y) — tol < ly — aol, 


and also limx—., fH (y) = Xo. 

Testing whether a point is locally stable may seem difficult, but when 
f(x) is differentiable, there is the reasonably easy test given in the next 
theorem. 


Theorem 10.4.1. Let xo be a fixed point of f(x) and assume that f(x) is 
differentiable at xp. Then xo is locally stable if |f'(ao)| <1, and if xo is 
locally stable, then |f'(ao)| < 1. 


Proof. If f(a) is differentiable, there is a constant a such that for all y close 


enough to 20, 
If(y) = f(%o)| 
ly — xo 
If | f’(vo)| < 1, then there is a 6 > 0 such that for all y very close to xo, the 
right side of this inequality is less than (1 — 4). Since zo is a fixed point of 
f(a), this gives 


S |f"(wo)| + aly — aol. 


If(y) — xo] < (1 — 6)|y — to] < |y— xo], 
and f(y) is in the same neighborhood as y. Further, letting 
dk = f(y) = Xo| gives dx < (1 = d)dK-1 < (1 = 6) do , 


and limx..dx = 0. Therefore, xo is locally stable. Conversely, if xg is 
locally stable, then for y close to xo, 


If(y) — vol = |f() — F(ao)| < ly — rol, 


and since f(a) is differentiable, 


yr — ly — ol 
oO 


Let’s look a bit more closely at this proof. The basic idea is that when we 
are close enough to a fixed point we can approximate a nonlinear difference 
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equation by a linear difference equation. Consider the difference equation 
141 = f(x-) near a fixed point FE. Define a new variabledx = |ax — E| 
and consider the linear equation dx = (1—0)dx_,. We have translated 
the fixed point E for x to the fixed point 0 for d. If f(a) is differentiable 
with |f’(z)| < 1, Then the linear equation gives an upper bound on 
the behavior of |xx — £E| in a small enough neighborhood of FE, and 
the convergence of the linear equation to 0 implies the convergence of the 
nonlinear equation to E. 

Linear approximations can also be made around non-fixed points, but we 
will not get as much useful information. Near a fixed point EF, f(E +.) = 
E + f'(E)e, so nearby points tend to stay nearby. For non-fixed points, 
nearby points may not stay nearby. For fixed points we may be able to 
repeatedly use the same linear approximation, but for non-fixed points we 
will continually need new linear approximations. For numerical solutions 
to equations, linear approximations are often used, but we have to face the 
prospect that the linearly computed solution may be very far from the true 
solution. 

Notice that the statement of this theorem has a small lacuna, since it 
does not say what happens when | f’(xo)| = 1. Essentially this is because we 
are making a linear approximation to f(a), and this linear approximation 
dominates the nonlinear behavior locally if | f’(ao)| < 1, but the nonlinear 
terms are necessary to determine behavior when |f’(x9)| = 1. For example, 
consider 

tei = f(t) = a [1 +r(1 — 2)], 


which has a fixed point at F = 1. The derivative is f’(a) = 1+r—2ra, and 
so | f’(1)| < 1 if0 <r < 2. By the theorem, x = 1 is locally stable when 
0 <r < 2. Let’s see what happens in the special cases r = 0 and r = 2, 
that is, when |f’(1)| = 1. When r = 0, f(x) = 2, and if the system starts 
near 1, it stays near 1 but the iterates do not converge to 1. On the other 
hand, when r = 2, f(x) = 1—(a—1)—2(a—1)?, and starting at 1—€ gives 
1+ ¢—2e¢?, which is in the same € neighborhood of 1. But starting at 1+. 
gives 1 — € — 2e?, which is not in the same € neighborhood of 1. In spite of 
this, f(f(a)) for 2 near 1 is always closer to 1, and limg.. f(*)(%) = 1 
for all x in (0,3/2). This suggests that our definition of “local stability” 
may not be ideal, and the interested reader should consult the literature 
for other definitions. 


10.4.2 Local stability of a cycle 
Stability for a cycle is similar to stability for a point. Recall that the system 
ti41 = f(t) 


has a K-cycle 21,%2,...,0« if f(a#1) = a4, for alli=1,...,K —1 and 
f‘®) (x1) = 21. It is convenient to note that each of the points 71, 72,...,rK 
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is a fixed point of the K-fold iterate f*) (a), and we say that 21, 72,...,©K 
is a locally stable cycle when each of the points 71, %2,...,%xK is a locally 
stable fixed point of f(*)(x). 

We would like to be able to simplify things by checking for the local 
stability of only one of the points, but there is the worry that one of the 
points could be locally stable, while some of the other points are not. Luck- 
ily, when we assume that f(x) is differentiable, these worries vanish because 
the stability condition at one point implies the stability condition at each of 
the other points. The reason for this is the Chain Rule, since the derivative 
is 


Df (a)] = FFE (a) DEF (a) 


and so 
Dif (a)lleme, = f' (ax) f"(ex-1) +++ f'(a1). 


This is also the value of the derivative at every point in the cycle, because 
taking another point only results in the terms in the product appearing in 
a different order! Therefore, a sufficient condition for local stability of the 
K-cycle 11, %2,...,K is 


|f'(ex)| |f (@x-1)| ++ |f"(e1)| <1. 
As an example, let us again consider 
Tt4+1 = f (xt) => x[1 + r(1 = Lt) ; 


this time with r slightly larger than 2. We will show that there is locally 
stable 2-cycle. 
The condition for a 2-cycle for this function is 


fF(@) =2= afl +r —a)[i+rd— f(x). 


We can eliminate the fixed point at x = 0 by dividing this equation by z. 
Simplifying the resulting equation by subtracting 1 and dividing by r gives 


Now, the fixed point at « = 1 can be eliminated by dividing by 1 — x, and 
after substituting for the remaining f(x), this gives the quadratic equation 


2+r—r(2Q+r)e+r7%2? =0, 
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which has two distinct real roots, since r > 2. (This can be seen by calcu- 
lating the discriminant of the polynomial.) We could calculate these roots, 
xz, and x2, but this polynomial already tells us that 


2+9r 
and 2@+%2= : 
Tr 


2 
(10.2) C1%2 = yo 


Remember that the sufficient condition for local stability of the cycle is 
| f’ (#1) f’(a2)| <1. 


It is easy to compute 
f(a) =1+r-—-2rer 


and 
f' (a1) f'(z2) = 14+ r)? — 2r(1 4 r)(21 + 22) + 4r? 2120. 


Using the formulas from (10.2) for 7122 and x; + x2 gives 


f' (a1) f' (ve) = (1 +r)? -2(01 + r)\2+r)+42+r)=5-77, 
and the local stability condition becomes 
2<r< v6. 


Hence, the system has a locally stable 2-cycle when r satisfies this inequal- 
ity. 


10.4.8 Local stability in two dimensions 


We can use the techniques of the previous subsections to look at the lo- 
cal stability of a nonlinear system in more than one dimension. The only 
modifications are that we need to generalize the idea of neighborhood and 
the idea of derivative. A neighborhood in one dimension is an open interval 
(x — €,x + €). For a neighborhood in more than one dimension we use 
the higher-dimensional ball of radius €. So an e-neighborhood of a point 
x is the set of all points y such that |x — y| < ¢, where we interpret the 
absolute value to mean Euclidean distance in the appropriate dimension. 

A linear approximation to an n-dimensional function should consist of n 
linear functions, one for each dimension. Specifically, if 


fil@, \) 
F(a,y) = 
is a two-dimensional function, our analog of the derivative is 


J(x,y) = fs LIA ’ 
Oy 
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the Jacobian matrix, which is also important in vector calculus. The 
partial derivatives inside this matrix are functions that are evaluated at 
the point (2,y) at which we are making the linear approximation. For 
example, if fi(z,y) = xy and fo(z,y) = x+y", For example, if 


fi(x,y) = xy 
and fo(z,y)=at+y’, 


Hee 


If we want a linear approximation near the point (0,0), then the Jacobian 
matrix evaluated at (0,0) becomes 


fo) 


and the linear approximation at (0,0) is 


(aes) = (Hora) + [to G) = (2) 
(263) ~ (Biro) +E a] Gra) =()- 


As this example shows, we still have a problem, because the derivative 
in one dimension was a single number, but the Jacobian matrix is an array 
of numbers. The idea in one dimension was that around a fixed point the 
linear approximation can be viewed as a linear difference equation. And 
the solutions of this equation converge to the fixed point if the value of the 
derivative is less than 1 in absolute value, because the value of the derivative 
is also the eigenvalue of the associated difference equation. Similarly, the 
eigenvalues of the Jacobian matrix are the eigenvalues of a matrix linear 
difference equation around a fixed point, and the solutions converge to the 
fixed point if all the eigenvalues are less than one in absolute value. This 
result generalizes to all dimensions, as the following theorem indicates. (For 
a proof, refer to [92].) 


then the Jacobian matrix is 


Theorem 10.4.2. If F is an n-dimensional differentiable function with 
fixed point X and J is the Jacobian matrix of F evaluated at X, then X 
is a locally stable fixed point if all eigenvalues of J have absolute value less 
than 1. If at least one of these absolute values is strictly greater than 1, the 
fixed point is unstable. 
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As an example, consider the system 


Le41 =sin(ys) , 
Yrt1 =C08(Zz) , 


whose Jacobian matrix is 


0 cos(y) 
— sin(z) 0 |’ 
with characteristic polynomial ch(\) = A? + sinxcosy and eigenvalues 


+,/— sin x cosy. Depending on the values of z and y, these roots may be 
real or complex. But we know that |sina| < 1 and |cosy| < 1, and so the 
modulus of each of these roots is at most 1. More strongly, if (a,y) is a 
fixed point, then |a| < 1 and |y| < 1, but then | sina| < 1 and |cosy| < 1. 
So the eigenvalues are strictly less than 1 in absolute value, and all fixed 
points are locally stable. 

The fixed points of this system can be approximated by plotting cos x 
along the z-axis and plotting siny along the y-axis, and the intersections 
are the fixed points. From this one can see that there is only one fixed 
point, and its coordinates satisfy 0 < « < 1 and 0< y < 1. This system 
has one fixed point, and it is locally stable. 

Another simple example of a two-dimensional system is 


Cra. = Ue + a(1 — Lt- yr) /6, 
Yer = yl + re — yz). 


For fixed points p = (x,y), 
x=x+a(1—a2-y)/6, 
y=yl+a—y), 

and simplifying these gives 


which has the three solutions, p = (0,0), (1,0), (1/2,1/2). The matrix of 
partial derivatives for the system is 


Hay = ['F © e-y)/6-x/6 —x/6 


y 1l+a2—2y|- 


The Jacobian matrix for the fixed point (0,0) is 


cee 
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which has an eigenvalue 7/6 > 1, and so (0,0) is unstable. The Jacobian 
matrix for the fixed point (1,0) is 


Py =| . 


and again there is an eigenvalue (\ = 2) larger than 1, and this fixed point 
is unstable. Finally, the fixed point (1/2,1/2) has the Jacobian matrix 


fe ap | 


17 1 
whose characteristic polynomial is ch(a) = \? — jot 5 and its eigenvalues 


are 2/3 and 3/4, and so this fixed point is locally stable. For this system, 
one could make the reasonable guess that if the system were started at any 
point that is not a fixed point, the trajectory eventually would approach 
the fixed point (1/2,1/2). While this guess is reasonable, one still needs 
to rule out other possible behaviors. For example, there could be cycles or 
aperiodic orbits that simply don’t show up in a fixed point analysis. 


10.5 Global Stability 


A locally stable point attracts trajectories within a small neighborhood of 
the point, but one is usually interested in larger neighborhoods. At the 
extreme, the neighborhood of interest could be the whole space. 

We encapsulate this by saying that a fixed point p of a4, = f(z) is 
globally stable for f(x) on the set B if limg—... f(b) = p holds for 
each initial condition b in B. 

Usually the set B is not specified, since by context one can tell that B is 
the reals or the positive reals or the interval [0, 1] or some other reasonable 
set. One would also like to say that a cycle or other invariant set is globally 
stable, but this is difficult, since fixed points and the special trajectories 
that lead to them might not converge to the cycle. People usually solve this 
problem by saying that the cycle is globally stable without mentioning the 
existence of the relatively few points that do not lead to the cycle. 

Unlike local stability, there is no nice characterization of global stability 
even if the function is differentiable. The classical technique to show global 
stability is to find a “Liapunov” or “energy” function, and show that this 
function is nonnegative and equal to 0 only at the fixed point, and that 
the “energy” decreases along each trajectory. (See LaSalle [92] for details.) 
Unfortunately finding a Liapunov function and showing that it has the re- 
quired properties is a formidable task. In Section 10.6 we will see that some 
simple functions can be used to prove global stability in one dimension. Of 
course, in higher dimensions global stability is harder to deal with. 
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10.5.1 Staircase convergence 


In Figure 10.2 we saw a rather complicated-looking web plot. But we might 
hope that for some systems that display global stability, the web plot would 
be much simpler. In particular, some systems have a web plot that looks 
like a staircase that leads up and down to the globally stable fixed point. 
We’ll first give a staircase theorem and then apply it to several examples. 


Theorem 10.5.1 (Staircase Theorem). Let f(x) be a continuous func- 
tion on the interval (a,b). Ifx < f(x) < pon(a,p) andp < f(x) < b 
on (p,b), then limnoo f'™ (xo) = p for all xo in (a,b). 


A formal proof of this theorem would argue that for x9 in (a, p), f( (ao) 
forms an increasing but bounded sequence, so this sequence has a limit, 
and argue from the continuity of f that any limit must be a fixed point 
of f. A similar argument for x9 in (p,b) would then show that p is the 
limit for every zo. A more visual argument simply follows the web plot. 
For xo in (a,p), the “staircase” starts at (vo,2o). The first “riser” goes up 
to (xo, f(@o)). The “stair” goes across to (f(xo)), f(vo)). Of course, this 
last point is closer to (p,p) than (ao, 20) was. Continuing in this fashion, 
that the staircase builds up toward (or hits) (p,p) is evident. Notice that 
the convergence is monotone. If the sequence starts below the fixed point, 
it is always increasing until it hits the fixed point, and if the sequence starts 
above the fixed point, it is always decreasing until it hits the fixed point. 

With this theorem’s assumptions, the difference equation 


tei = f(t) 


obeys limyo2: = p for every choice of initial condition in (a,b). For 
example, the linear difference equation 


1 
Tt41 = 3 Lt 
obeys limy.2, = 0 for every choice of initial condition in (—oo,0o), 


because x < $2 < 0 on (—co,0), and 0 < $2 < zon (0,00). 


For the nonlinear difference equation 244, = f(a), where 


_ J#@—2) on (0,1), 
jay eo? on (1,2), 


the theorem says that limp... 2; = 1 for all a € (0,2). Notice that 
while f(x) is continuous, f’(x) is not continuous at « = 1, but neither 
differentiability nor monotonicity of f(a) is required by the theorem. For 
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example, the nonlinear difference equation 2:4, = f(a) with 
4x on (0, 1/4), 
oe 1— (a— 1/4) on (1/4,1/2), 
3/4 + (a— 1/2) on (1/2,3/4), 
1 on (3/4, 5), 
satisfies the theorem’s assumptions with p = 1, and limy;... 7; = 1 for all 
Lo € (0, 5). 


2(x%4.—1) 


As another example, let’s consider 2441 = e . we claim that there 


exists a fixed point p, where 0 < p< 1 and 


(10.3) u< f(x) <p for x € (0,p), 
p< f(x) <a for x € (p,1). 


Even though we don’t know the exact value of p, we can still argue that p 
exists. Let g(a) = e2(*-)) — x. Clearly, g(0) > 0 and g(1/2) <0, so e%(*-)) 
has a fixed point in (0,1/2). Computing g’(a) = 2e?*-) — 1 and g!(x) = 
4e?(*-1), we find that g/(x) is an increasing function that is negative at 
x = 0, and positive at x = 1. Hence g(x) has two roots, one at x = 1 
and one at « = p with 0 < p < 1. Further, this argument shows that the 
bounds (10.3) needed for the staircase theorem hold, and starting at any 
xo € (0,1), limz+.a: = p. The pleasant conclusion is that even though 
we don’t know the value of p, we can use the iterates of the difference 
equation a4, = e?+—) to calculate an approximate value of p. 


10.5.2. Nonmonotonic convergence 


Surprisingly enough, the staircase theorem can be used to show global 
stability for functions that do not satisfy its hypotheses. The “trick” is that 
even if f(a) does not satisfy the hypotheses, the iterate f(f(a)) might. 

Take a look at Figure 10.3, where f(z) = xe?“-*), Clearly, f(x) does 
not satisfy the hypotheses, because there is a point po with f(po) = 1 and 
po <1,so that f(x) > 1 for all x € (po, 1). But for all x € (po, 1), f(f(x)) < 
1. Now if we can show that f(f(x)) > x for these x’s, we will have half the 
staircase hypothesis for f). We want f(f(a)) = xe2-9)+2C-f) > a, 
but dividing by x and taking logarithms this is equivalent to 


2—z2 > f, 


and you can see from Figure 10.3 that this inequality holds. For the interval 
(1,2—po), from the figure, f(x) > 2—x > po and f(f(x)) > f(po) = 1. Also, 
for this interval, f(f(x)) = re?™-")+20-P) < x, because 2—2x < f(x). So 
the second iterate f(f(x)) satisfies the staircase hypotheses for the interval 
(po,2 — po), and hence limy_... f?”)(xo) = 1 holds for every xo in this 
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FIGURE 10.3. Showing global stability for a simple model. The curve f(x) is 
bounded by the line y = 2— a. 


interval. Further, since f(2o) is also in this interval, limp... f?"tY (ao) = 
1, and limm—oo f™ (20) = 1 holds for all xo in (po, 2 — po). 

We still have a little cleaning up to do. If 2 € (0,p09), then for some 
k, f™ (ao) € (po,1), and if a9 > 2— po, then f(xo) € (0,1). So for 
any to € (0,00), some iterate falls into the interval where the staircase 
theorem applies, and limp... f(”) (#9) = 1 for all aq € (0,00). 

In a sense, we have just shown global stability by extending local stability 
from small enough neighborhoods to larger intervals. For local stability, 
we approximated a difference equation by a linear difference equation. 
Said another way, we try to bound a curve locally above and below by a 
straight line. If this straight line has slope at most 1 in absolute value, we 
may be able to use the line to show local stability. In our global stability 
example, we bounded the function f(x) above and below by the straight 
line 2 — a. Since this line has slope —1, following the difference equation 
t141 = 2— a, for two steps brings us back to to the same point. But 
following x441 = f( 2: ) for two steps will bring us nearer to the fixed 
point. Since the bounding by 2 — x holds over a large interval, we can 
argue that the solutions to the nonlinear difference equation will converge 
to the fixed point. In our specific example, the form of f(a) made checking 
the bounding easy, but the bounding was all that was needed for global 
stability. We summarize this discussion with the following theorem. 


Theorem 10.5.2. Ifa < f(%) < 2-2 for x in (0,1) andifa > f(x) > 
2—<2 forx>1, then x =1 is globally stable for (0,00). 
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In the next section, we consider a class of functions that generalize the 
linear functions and show that global stability for the difference equation 
tt41 = f(a ) can be shown by bounding f(x) by one of the functions 
from this special class. 


10.6 Linear Fractional Recurrences 


Among the simplest nonlinear recurrences are the linear fractional recur- 
rences, which are simply the ratios of two linear recurrences. These recur- 
rences are worth studying here because: 

(a) they are simple; 

(b) they give examples of how nonlinear recurrences can work; 

(c) techniques from linear recurrences can be used; 

(d) they can be used to study other nonlinear recurrences. 


We write a linear fractional recurrence in the form 


where we assume that a,b,c,d are real constants and that x; is a real vari- 
able. Later we will consider the special case in which the constants and 
initial values are rational numbers. The usual questions asked about non- 
linear systems include the existence of fixed points, the existence of cycles 
of various lengths, the asymptotic behavior of the system (e.g., are the fixed 
points or cycles attractive in some sense”), local and global stability of fixed 
points and cycles, chaos or chaotic-like behavior, and average behavior for 
a distribution of initial conditions. Pleasantly enough, all these questions 
can be answered in a relatively easy fashion for linear fractional systems. 

First we should note that a linear fractional system degenerates into a 
linear one when c = 0, since 


ar, +b 
la =art+b. 


(10.4) Lt41 = 


Q»> 


Oxy bal 


As we’ve already seen, the solution to such a linear recurrence is 
y , 


eS zo + bt ifa=1, 
te = alto + bra’ = { a‘ao + poo ifaAl. 

i=0 @ 
It is usual to analyze such a system in terms of the size of a. Generally, 
when |a| > 1, the solutions increase in absolute value, while when |a| < 1, 
the solutions decrease in absolute value. In fact, using the change of variable 
y=a+ —, the linear system (10.4) can be rewritten as yz41 = ayz, and 
the analysis in terms of growth rate is perfectly appropriate. Of course, 
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this transformation does not make sense when a = 1, since in the original 
(10.4) all solutions diverge, while in the transformed system every point is 
a fixed point. 

If we analyze (10.4) as a nonlinear system, we get a different picture. 
First, there is a fixed point at —b/(a— 1), which is attractive when |a| < 1 
for all real initial conditions. If we extend the real numbers by including 
oo, the linear system always has oo as a fixed point, which is attractive 
when |a| > 1. In summary, a linear system can be seen to have two fixed 
points, one at —b/(a — 1) and the other at oo. Which one is attractive 
depends on whether |a| < 1 or |a| > 1. For the case a = 1, the two fixed 
points degenerate into a single fixed point at oo, which is an attractive 
fixed point. When one of the two fixed points is attractive, the system 
exponentially converges to that fixed point, but the convergence is only 
linear when there is one fixed point. 

Periodic behavior occurs when a = —1, since all non-fixed points have 
period 2, and the 2-cycles are not attractive. They are sometimes called 
neutrally stable because if the system starts at a point near a cycle, the 
system always stays near the cycle but the system does not approach any 
closer to the cycle. For the very special case a = 1 and b = 0, every point 
becomes a neutrally stable fixed point. Finally, in the extremely singular 
case a = 0, all trajectories go to b in one step, and 6 is called a superstable 
point. 

Behavior identical to the linear system (10.4) can also be found in the 
nonlinear system 

at 
ba +a- 


4t+1 = 


The only difference is that « has been replaced by 1/2, and so the fixed 
points have been shifted from —b/(a—1) and oo to (a—1)/—band 0. This 
observation suggests that linear fractional systems may have many features 
in common with linear ones, but that their analysis depends on functions 
of several parameters and that these functions should be invariant under 
such transformations as taking reciprocals of the system’s variable. 


10.6.1 Asymptotic behavior 


As for other nonlinear systems, the first step in analyzing linear fractional 
systems is to find their fixed points and determine their stability. Because 
linear fractional systems are closely related to linear systems, stability will 
depend on the eigenvalues of an associated matrix. Since the fixed points 
and eigenvalues may be irrational even when the linear fractional system’s 
parameters are rational, we will make use of two simpler functions of the 
parameters. These are the determinant, Det = ad -— bc, and the dis- 
criminant,, Disc = (a — d)? + 4be. 
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The fixed-point equation is 


This is equivalent to the polynomial equation cp? + (d—a)p — b = 0. 
By the quadratic formula, the roots of this equation are 


p= 5 (e-d# (a— dy? + Abe | 
- s[a-a4 (a+ dP = Tad—be) |. 


2 


So, the number of fixed points will depend on the discriminant, Disc. 
Let us now assume that the linear fractional system is nonlinear, 


ar, +b 


—— h : 
eer where c 4 0 


Tt4+1 = 
When the determinant Det = ad — bc of the recurrence is zero, we can use 
ad = bc to get 


az+b  adcex+bed  be(cxa+d) _ b 


P(x) = cr+d ced(cxn +d)  cd(cxr+d) d- 


In one iteration all trajectories go to the superstable point b/d. (The super- 
stable point is a/c if d= 0.) So, the determinant tells us something about 
stability. 

We now know what happens when Det is zero. We still have to investigate 
the behavior when Det is positive, and when Det is negative. 


From the fixed point equation p = $ [a —d+ /(a+d)? — 4(ad— bc) , 
we see that if Det is negative, then Disc is positive, and this equation will 
have two real roots. On the other hand, if Det is positive, the sign of Disc 
is not determined. If Disc is positive, there will be two real fixed points. If 
Disc is zero, there is only one fixed point. (In essence, the two fixed points 
have coalesced into one point.) If Disc is negative, there are no real fixed 
points. (In this case, there are two complex fixed points, but we will not 
see them because all our operations are in the reals.) 

A linear fractional system can be represented in a linear fashion by 


fo al('): 


where 29 is the initial point and the ratio of the components of the produced 
vector gives the value of x,. So, x, can be computed by performing the 


linear iteration ” 
a b XL 
c ad 1 
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and then taking a ratio of components to obtain xz,. The eigenvalues of 
the matrix are called the eigenvalues of the linear fractional system, and 
they are the roots of the equation \? — (a+d)\ + ad — bc = 0. Notice that 
the discriminant of this polynomial is (a + d)? — 4(ad — be), which equals 
our previously defined Disc. So, if Disc is positive, the matrix will have 
two distinct real eigenvalues. As above, Det < 0 implies Disc is positive, 
and also implies that there is one positive and one negative eigenvalue. The 
extra assumption that a+d + 0 implies that the eigenvalues have different 
absolute values. On the other hand, when a + d = 0, the two eigenvalues 
will have the same absolute value: one eigenvalue will be the negative of 
the other eigenvalue. 

Let us assume that Disc is positive. Since the eigenvalues are distinct, 
powers of the matrix can be computed by diagonalizing the matrix, and 
the n* linear iteration is calculated as 


b 1 1 hr Oo Az-a@ sy x 
ae do — An - _ ) 2 | | 1a | ( : ; 
a mom | pe wee |[ 0 | 4 1 


If we assume that |Ai| > |A2|, then taking a ratio and a limit gives 


li ? 

ae 
unless v9 = b/(A2g — a). It is easy to check that these two points are in 
fact the fixed points, and so the linear fractional system has one unstable 
fixed point and one globally attractive fixed point when Disc > 0 and 
|Ai| > |Ag|. (The given formulas are indeterminate when b = 0, but then 
they can be written in the equivalent forms (A, — d)/c and (Ag — d)/c.) A 
special case arises when Disc > 0 and |A;| = |Ag|, but for this to occur, 
both Det < 0 and a+d = 0 are required. For this special case, a simple 
calculation shows that f(f(x)) = 2, which means that every point except 
for the two fixed points will have period 2, and of course, neither of the 
fixed points will be attractive. 

A more interesting case occurs when Disc is zero, which means that 
there is only one fixed point. Geometrically, this says that the line y = = is 
tangent to the curve y = f(x) and the point of tangency is the fixed point. 
It can be seen (see Figure 10.5) that every point above the fixed point 
iterates to a point still above but nearer to the fixed point. While a point 
below the fixed point first iterates away from the fixed point, eventually one 
of its iterates jumps across the discontinuity and then will jump to a point 
above the fixed point. As an example, consider f(a) = #/(a +1). (Refer to 
Figure 10.5.) Here, a, = ro/(nap + 1), and every trajectory converges to 
the fixed point 0. The jumping between branches is hidden by this formula, 
but can be displayed by following an example. The trajectory starting at 
—3/4 gives —3/4, —3,3/2 and then converges through positive values to 0. 
Notice that the convergence is different in the one and two fixed point cases. 
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x+1/x 
oO 


FIGURE 10.4. A plot showing the linear fractional system vi41 = (vt + 1)/ae 
with two fixed points. All iterates converge to the upper fixed point. 


For two fixed points, 2, = p+ O(y"), where p is the stable fixed point and 
|y| < 1, while in the one fixed-point case, x, = p+ O(1/n). Figure 10.4 
shows the geometry of a two fixed-point case and Figure 10.5 shows the 
geometry of a one fixed-point case. 


x/x+1 
oO 


FIGURE 10.5. A plot showing the linear fractional system vi+1 = v¢/(a+ + 1) 
with one fixed point. Convergence to the fixed point is from above. 
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So far, linear fractional systems have not behaved very differently from 
linear systems. The remaining cases, in which Disc is negative (and so 
4Det = (a + d)? — Disc is positive), will display more nonlinear behavior. 
We will discuss these cases in the following subsections. 


10.6.2 Rational coefficients and periodicity 


Like other nonlinear systems, linear fractional systems can display periodic 
behavior. But unlike other systems, linear fractional systems do not allow 
the co-existence of cycles of different lengths. 


Theorem 10.6.1. Jf for a linear fractional map f there exists a point x 
such that f®) (a) = a, then either K =1 (and x is a fixed point) or for all 
y, f(y) = y (all points are periodic). 


b 
Proof. As above, the linear fractional map f(x) = = ~ 7 can be considered 
as a 2 x 2 matrix A = | ; | acting on a vector ( ; / Since A is a 


2x2 matrix, its characteristic polynomial is quadratic, and for every natural 
number K, there exist scalars ax and Gx such that AX = ag A+ Gx, 


aid AK( 7 )=axa( 7) +8«( 7). 


ag(ax+b)+ Bex _ 
ag(ca+d)+ Be 


There are two possibilities for this equation: 
(a) ax =0,and so A* = Gx, and for every y, A* ( ) = BK ( iF 


which is equivalent to Gxy/3x = y, so all points have period K ; 
(b) ax #0, and then ax + b = (cx + d)ax, which is the equation for fixed 
points. 
Oo 


This theorem allows for the co-existence of periodic points and fixed 
points. When the linear fractional system has real eigenvalues, two real 
fixed points will occur, and periodic behavior will occur when the two 
eigenvalues have the same magnitude. In this case, all non-fixed points will 
have period 2. 

When the eigenvalues are complex, there will be no real fixed points, 
and the magnitudes of the eigenvalues are forced to be equal. Here, let 
e = \1/Xo. If 0 is a rational multiple of 7, then e”” is a root of unity, 
and there will be a least positive integer K such that AX = GI. In this 
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case, every point will be periodic and have period K. If @ is not a rational 
multiple of 7, the system will not be periodic, and all points will fail to be 
periodic. We will consider this situation in more detail in the next section. 

In general, all values of 6 are possible, since a direct calculation of A1/A2 
shows that any desired complex number of norm 1 can be produced by 
appropriate choice of the parameters a, b,c, d for the linear fractional map. 
Of course, these parameters can be chosen as real numbers, but in realistic 
uses of linear fractional systems one would like to assume that the param- 
eters have finite representations. In particular, one might like to assume 
that these parameters are rational, and then of course the same linear frac- 
tional map can be represented using integer parameters. We would like to 
know which periods are possible for linear fractional systems with rational 
parameters. The following theorem gives the answer. 


Theorem 10.6.2. A linear fractional system with rational (or integer) 
parameters can only have periods 1, 2, 3, 4, and 6, and there are examples 
of rational linear fractional systems with each of these periods. 


Proof. Let y = »1/A2, the ratio of the eigenvalues, and ¥ is a root of a 
polynomial y?—ey+1, where e is a rational function of a, b,c, d. If the linear 
fractional system has period K, then ¥ is a primitive K*® root of unity; 
that is, y* = 1 but y7 4 1 for all 0 < J < K. For each natural number K, 
it can be shown [141, Section 1.2] that the cyclotomic polynomial 


x(x) =II(a—¢), where ¢ ranges over all primitive K‘? roots of unity , 


has integer coefficients and its degree is ¢(K), where ¢ is Euler’s Phi Func- 
tion used in Chapter 8. Further, this polynomial is minimal in the sense 
that it does not have any smaller-degree rational polynomial factors. 

Recalling that ¢(/) counts the number of positive integers i < K with 
gcd(K,7) = 1, it can be shown that every integer K > 6 has d(K) > 3, 
because 1 and K — 1 and a third number are relatively prime to kK. We 
find this third number in each of three cases. One, if K is odd and K > 5, 
then 4 is also relatively prime to kK. Two, if K > 6 and K = 2m with m 
odd, then m — 2 is relatively prime to K because 


gcd(2m,m — 2) = gcd(m — 2,4) =1, 


since m— 2 is odd. (We need K > 6 because for K = 6, m—2 = 1.) Three, 
if K > 6 and K is a multiple of 4, then K/2 — 1 is relatively prime to K 
because 
gcd(K, K/2—1) = gcd(K/2—1,2) =1, 
since K/2 — 1 is odd. 
Returning to the problem at hand, y is a root of the cyclotomic poly- 


nomial ®, (a), and y also satisfies a rational quadratic polynomial. The 
minimality of ®« implies that ¢(A) must equal 1 or 2. Hence, the only 
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possible values for K are 1, 2,3, 4,6, since we have ruled out all K > 7, and 
it’s easy to check that (5) = 4. 

We now give an example of a rational linear fractional system for each 
of these five periods. 


Period 1: f(a) = x. This is a degenerate linear fractional system in which 
all points have period one. 


Period 2: For f(x) = (a — 2)/(2x — 1) there are complex eigenvalues and 
no fixed points. An example period is 2 ——> 0. 


For f(a) = (5a—2)/(4a—5), there are real eigenvalues and two fixed 
points at (5 + V17)/4. All other points have period 2, for example 


1<-— -3. 


Period 3: f(x) = (a —1)/z. 
Example period is 3 —> 2/3 —+ —1/2. 


Period 4: f(x) = (a —1)/(a+1). 
Example period is 2 —> 1/3 —+ —1/2 —> —3. 


Period 6: f(x) = (2% —1)/(a +1). 
Example period is 3 —> 5/4 —> 2/3 —> 1/5 1/2 — -4., 


A pleasant outcome of this analysis is that it is easy to test for periodicity 
in rational linear fractional systems. One simply tries an initial condition 
and checks to see whether it gives periodic behavior of length at most 6. The 
only minor problem is that in the case of real eigenvalues one could chance 
on a fixed point and other points would need to be tested for periodicity. 


10.6.8 Chaotic-like behavior 


Three characteristics of chaos are: 

(a) cycles of every period; 

(b) sensitive dependence on initial conditions; 

(c) for every open set A and every open set B there is an a € A anda 

K EN such that f) (ao) € B. 

For linear fractional systems, (a) is simply false, but we will see that (b) 
and (c) do hold when the ratio of the eigenvalues is not a root of unity. 
Recall that sensitive dependence means that regardless of how close two 
different trajectories are when they begin, eventually they are far apart. 
For linear fractional systems, the pole at p = —d/c forces trajectories to 
diverge from one another. For example, if one chose to consider trajectories 
starting at p—e and p+e, then |f(p+e)— f(p—e)| = 2|Det/c?e|, a quantity 
that can be made as large as one likes by taking € small. For some linear 
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fractional systems, this is not a problem because trajectories are attracted 
to stable fixed points and any initial divergence disappears in the long term. 
For periodic linear fractional systems, an initial difference is in essence 
maintained and will not increase. But we will see that there are linear 
fractional systems with complex eigenvalues for which the dependence on 
initial conditions does not die out, since every trajectory eventually comes 
close to the pole and then will be thrown far away. So two trajectories that 
start close together are eventually far apart, and there is no tendency for 
them to come close again. To see what happens in this case, we first show 
that (c) holds. For this, a sequence (S;,) is called a source for A if for every 
a € A and for every ¢ > 0 there is a K such that ja — Sx| <e. 


Lemma 10.6.3. If (S,) is a source for A and g(x) is a function from A 
onto B such that every preimage has a neighborhood of continuity, then 
g((Sp)) is a source for B. 


Proof. If w is the desired point in B with desired closeness 6, then since 
g is onto, there is a preimage v € A such that g(v) = w. Since there is 
a neighborhood of continuity around v, there exists an € > 0 such that if 
|a—v| <e, then |g(x) — g(v)| = |g(x) — w| < 6. But (S;,) is a source for A, 
which means that there is a K such that |Sx—v| < € and so |g(Sx)—w] < 6. 
This proves that g((S;,)) is a source for B. oO 


Theorem 10.6.4. If (x,) is a trajectory of a linear fractional system in 
which the ratio of its eigenvalues is \y/A2 = e*°, where 0 is an irrational 
multiple of x, then (a) is a source for (—co, 00). 


Proof. The n*® term of (x,) can be written as 


—axo — b+ xo[cos 6 — sin 6 cot n6] 
L, = 
a — cx — [cos@ + sin 0 cot nO] 


Since @ is an irrational multiple of 7, the sequence (n0/2a mod 1) is a 
source for [0,1] (refer to [120, Chapter 3]). But the transformation from 
nO to x, satisfies the hypotheses of the lemma, and so (#,,) is a source for 
(—oo, 00). oO 


Hence, these linear fractional systems obey (c), and they also obey (b), 
because any two nearby trajectories eventually hit a small neighborhood of 
the pole and then are thrown far apart. 

Figure 10.6 shows the web diagram for the linear fractional map f(x) = 
(a — 5)/(a +1). One can see that with the 100 iterates used in the dia- 
gram calculation, almost all points are filled in. Figure 10.7 shows sensitive 
dependence on initial conditions in that the two displayed trajectories are 
often very close but are occasionally far apart. 
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FIGURE 10.6. A web plot of the model 2-8, showing that most points are visited. 


a+l1? 


10.6.4. Invariant distributions 


When a system has wandering trajectories that do not converge to cycles 
or fixed points, one might ask whether perhaps other properties are con- 
served. Natural questions to ask are how long a trajectory stays in a region 
and how often a trajectory visits a region. It may be difficult to answer 
these questions and perhaps unenlightening, because we want to know how 
the system behaves rather than how a specific trajectory behaves. So, we 
consider putting a probability distribution on the space and then asking 
how the distribution is changed by the system. In particular, we would like 
to know whether there is a very special distribution that remains the same 
after the system acts on it. If we could prepare enough copies of a system 
and start these copies in accord with this invariant distribution, then we 
would later (even a long time later) still see the same distribution of states. 
Although the copies that started in particular states are no longer in those 
states, other copies are in those states in proportion to the invariant density. 
We could also hope for some sort of convergence. It might be possible that 
the invariant distribution would be attractive, if not for all initial distri- 
butions, then at least for a class of initial distributions. By attractive here 
we mean that under some reasonable definition of distance between distri- 
butions, the distance between the system’s distribution at time t and the 
invariant distribution converges to 0 as t increases. We will show that linear 
fractional systems do have invariant distributions, but these distributions 
are not attractive. 
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FIGURE 10.7. Two trajectories for f(x) = ae. One trajectory starts at —1.05 

and the other at —0.95. These two trajectories are often very close, but occasion- 

ally they are far apart. 


Although we spoke of probability distributions, it may be easier to work 
with probability densities. A density function g() is a nonnegative func- 
tion defined on the real numbers with the property that every ee (a, a 
has probability bia g(x)dx. We normalize g(x) to have foe xr)dz = 
Now let us see how i mass on the interval (a, 3) is tr eee by . 
mapping f. First, the interval (a, () is transformed to (f(a), f(@)), and 
then the mass that was on (a, 3) is spread over this new interval. But if 
g(x) is an invariant density, then the density on the new interval must 
also be g(a). If we let g(a) be the distribution on the new interval, we 


have 2 9 Da ee ie a x)dx, and using x = f(y) and dx = f'(y)dy this 


becomes sb g(x)dx = fe g(x) f'(y)dy. Since a and f are arbitrary, differen- 
tiating with alee to B a g(x) = g(f(x))f’ (x) at least for those 2’s at 
which these functions are continuous. If g(x) is an invariant density, then 
gi(x) = gr(f(x)) f’(x). Since we want g7(x) to have a finite integral over 
(—oo, 00), gr(x) should be close to 0 when « is large in absolute value, and 
so it may be easier to look at h(x) = 1/gr(x). Then h(x) = h(f(x))/f'(2). 
Using a linear fractional map for f, which has been normalized to have 
Det = ad — bc = 1, then 


aee 


h(x) = (ex + d*n(= — 
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Assuming that h(x) has a power series, then h(a) must be a polynomial of 
degree 2. The coefficients of the polynomial can be found by solving a set 
of linear equations. Up to an unknown scale factor, the unique solution to 
the set of three linear equations gives 


h(x) = cx? + (d—a)x —b, 


and we obtain the following theorem. 


Theorem 10.6.5. Jf f(x) is a linear fractional map with complex eigen- 
values, then there is a unique invariant density 


g(x) = ———+—~_—_ 
cx? + (d—a)x —b’ 


where y is determined from the normalization condition i g(a)da = 1. 


The complex eigenvalue condition is equivalent to g(x) having no real poles, 
which in turn implies that g(a) is integrable, and the normalization makes 
sense. 
Conveniently enough, the integral of the density function can also be 
written in closed form as 
/ 7 1 2cz+d-—a 1 


x)dz = — arctan ———_———————. +-. 
oes —(d—a)?—4be 2 


If one could choose the states of an ensemble of systems to satisfy the 
density gr(x), then as the states evolve under the application of f, the same 
density must be maintained. One might hope that every density evolves to 
gr(a) under f. The clue that this is not the case is in the theorem’s condition 
that all one needs for an invariant distribution is that the eigenvalues are 
complex. But for instance, any linear fractional system with eigenvalues 2 
and —7 has period 2. For such a system, any initial distribution repeats 
after two steps, and the density does not converge to the invariant density. 
More specifically, one can prove the following theorem. 


Theorem 10.6.6. For a linear fractional system with complex eigenval- 
ues, the invariant density is not attractive even within the class of den- 
sities whose reciprocals are quadratic polynomials. For such densities, the 
discriminant is a conserved (invariant) quantity. 


The proof of this theorem consists in showing that the coefficients of the 
new density can be computed from the coefficients of the old density by 
applying a matrix to the vector of coefficients. The invariant density cor- 
responds to the l-eigenvector (the eigenvector associated with the eigen- 
value 1). One can then show that the other two eigenvalues of the matrix 
also have absolute value equal to 1. Hence, even though a density converts 
to a new density, the difference from the 1-eigenvector does not decrease, 
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and the 1-eigenvector corresponding to the invariant density is not attrac- 
tive. In fact, one can also show that the discriminant is preserved. Starting 
with a density that is the reciprocal of quadratic polynomial and iterating 
using f results in a density of the same kind, and the discriminant of this 
density is identical to the discriminant of the starting density. 

What does a typical aperiodic trajectory look like? One way to describe 
such a trajectory is to use a histogram, that is, to break up the range into 
small bins and then count the number of times the trajectory visits each 
bin. One can then normalize by the number of iterates and hope that a 
limiting histogram exists. 

Let (f'(ao)) be the sequence of iterates of f starting at xo. If this 
trajectory is well behaved, then there is an associated histogram H such 
that H((a,b)) is the frequency with which the trajectory visits the interval 
(a,b). Then 


H((a,b)) = lim HD holt Xo), 


Noo N 


where J(q,»)[2] is the indicator function that gives 1 when its argument is in 
(a,b) and 0 when its argument is not in (a,b). Assuming that these limits 
exist and that H is smooth, then 


H(x)= lim H((a — €,a + €)) 
should exist and behave like a probability density. 

Assuming that all of this is true, what should H() look like? Since we are 
looking at limiting behavior, it should not matter whether the trajectory 
starts at xo or at f(ao). Hence, we expect the limiting histogram to be 
an invariant density. But by Theorem 10.6.5 there is a unique invariant 
density, so H(x) should look like g(x). Figure 10.8 shows a histogram for 
1000 iterates of (a — 5)/(a +1). This histogram looks quite smooth and 
agrees reasonably with the invariant density 1/(a? +5). 

These results tell us that in spite of a seemingly irregular trajectory, a 
fairly simple property is maintained. If one looks at any individual trajec- 
tory, then the long-run histogram should look like a fairly simple function. 
On the other hand, if one took multiple copies of the same linear fractional 
system, assigned initial conditions with the probabilities given by the his- 
togram, and then looked at the distribution of states after one or several 
time intervals, the probability density would still be the same as the initial 
density. This is a sort of ergodic theorem, which says that the average 
over one trajectory is the same as the appropriate average over an ensemble 
of systems. 

Notice that if one picked a density g(x) and picked the initial conditions 
for an ensemble according to g(a), one would expect a different density, say 
gi(a), to occur after one time step, and then densities go(x), g3(x), ga(x),.-. 
for subsequent time steps. There is no reason to expect this sequence to 
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FIGURE 10.8. A histogram for a trajectory of (x — 5)/(~ + 1) showing good 
agreement with the density 1/(2? +5). 


converge to a single density. In fact, Theorem 10.6.6 says that this sequence 

does not converge, but the sequence of averages does converge. That is, for 
N- : 

Go(x) = g(x) and Gy(x) = + g(a), then limy—oo Gy (a) = gr(2). 


10.6.5 Proving global stability 


Our initial interest in linear fractional systems came from an application 
to population models. For some years, it has been known that the usual 
one-dimensional population models are globally stable exactly when they 
are locally stable [63, 32, 148]. We were pleased to find that the usual 
population models were “enveloped” by linear fractional maps of the special 


form f 
— ar 
VS a — (2a—1)a2’ 
These special linear fractional systems all have period 2, and so by a vari- 
ation on Sarkovskii’s Theorem (refer to Theorem 10.2.1) we were able to 
prove the following. 


where a € [0,1) . 


Theorem 10.6.7. Let é(x) be a monotone decreasing function that is pos- 
itive on (0,2_) and such that its second iterate is the identity; that is, 
o(d(x)) = x. If f(x) is a continuous function such that 


e d(x) > f(a) on (0,1) , 
e d(x) < f(x) on (1,2_) , 
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e f(x) >a on (0,1) , 
e f(x) <2 on (1,00), 
e f(x) >0 on (1,20) , 


then for all x € (0,20), limy—soo f(a) = 1. 


This theorem enabled us to show that local stability implies global sta- 
bility for the following seven population models: 


© riy1 = 2e7—*) , 

e ti41=%;[14+r(1—2;)] , 

e 41 = 2[1—rlna], 

¢ fi = tle — 4), 

© oy, = Stee 

ety = Oe, witha>0,b>0, 
© etl = THT 


Details of these results appear in [38, 39, 45]. 


10.6.6 Summary 


Linear fractional systems form a fairly simple class of nonlinear systems, 
yet they display many of the possible behaviors for nonlinear systems. In 
particular, they have both stable and unstable fixed points and exhibit 
both periodic and chaotic-like behavior. Table 10.1 gives a summary of the 
possible behaviors and the corresponding conditions on the parameters. 

In contrast to most nonlinear systems, linear fractional systems can be 
analyzed using techniques from elementary mathematics and linear sys- 
tems. Further, we have shown that linear fractional systems can be used 
to analyze more complicated systems, and we suggest that linear fractional 
systems should be a standard part of the toolbox for studying nonlinear 
systems. 
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TABLE 10.1. The possible asymptotic behavior of a linear fractional system 


f(x) = +5 (we assume c # 0 to get a nonlinear system) and set Det = ad — bc, 


Disc = (a — d)? + 4be. 
| Det=0 | | fie)=6/d Superstable fixed point 
HeezZD One stable fixed point 
d+a#0 |} One unstable fixed point 
Two real Convergence t, = F.P. + O(A"), |A| <1 
fixed points biaieiaamal| +az=0 | All points except the fixed points have period 
ficial neutral stability 


One globally stable fixed point 
Convergence x, = F.P. + O(+) 
One stable fixed point 
Disc > 0 | One unstable fixed point 
Convergence #, = F.P.+ O(A"), |A| < 1 
Det > 0 Periodic, all Rational coefficients - 

points have the possible periods 1,2,3,4,6 
riod possible 
No periodic points 

Chactiedie All open sets visited 
Sensitive dependence on 
initial conditions 
Invariant density - Non- 
attractive 


10.7 Conclusion 


As we said at the beginning of this chapter, nonlinear systems can be non- 
linear in many different ways. We have considered some very simple types, 
and our examples have usually been one-dimensional. Realistic systems of- 
ten have high dimension, and the simple analyses possible in one dimension 
do not apply. On the other hand, knowing that chaos is possible in one di- 
mension both warns us that higher-dimensional systems may behave in 
very complicated ways, and suggests that we may be able to explain this 
complicated behavior by finding a chaotic one-dimensional subsystem. We 
have limited our examples to the reals or rationals, but some systems may 
be easier to understand if one extends the analysis to the complex num- 
bers. For example, a linear fractional map may or may not have fixed points 
when restricted to the real numbers, but such systems always have fixed 
points in the complex numbers. Other nonlinear systems may be composed 
of simple components with finite state sets like neural nets, or have an infi- 
nite but locally finite state set like cellular automata, or have state sets like 
the natural numbers or strings over a finite alphabet like Turing machines 
and other models of computation. Analyses of such systems can be shown 
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to be impossible [77]. Even when limited to finite state sets, the analysis of 
such systems can be shown to be possible, but practically unreasonable [2]. 

We close with the warning that realistic systems may be highly nonlinear 
and highly complicated, but also with the hope that as in the past, future 
generations of scientists and engineers will find simple enough models to 
solve societal problems. 
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Ex 10.1. Show that if lim... f (ao) = p and f is a continuous function, 
then p is a fixed point of f. 


Ex 10.2. Let x41 = f(z), where 


g4t if u<st, 
f(@)=484+5 if $<aK<l, 
1 if a>1. 


Show that for zo < 1/2, limx 0 f{*)(ao) = 1/2, but for zo > 1/2, 
limx oo f*)(2o) = 1. Does this contradict the results of the previous 
exercise? Can this system be enveloped by a linear fractional system? Does 
this violate Theorem 10.6.7? 


Ex 10.3. Show that 2:4) = sin 2; has one fixed point, which is both locally 
and globally stable. 

Hint: A local linear approximation is insufficient. Use a local nonlinear 
approximation to show local stability. 


Ex 10.4. Analyze the local and global stability of z = 1 in the system 
Li41 = %[1+2(1— 2)]. 
Ex 10.5. Analyze the local and global stability of = 1 in the system 
L141 = X[1 + 2.00001(1 — z;)]. 
Ex 10.6. Show that 


Tt4+1 = xe[1 + rin xt 


has a locally stable period 2-cycle when r is slightly larger than 2. 


Ex 10.7. Let g(x) = \/|z|. Show that Newton’s method oscillates with 
period 2 for non-zero initial values. 


Ex 10.8. Let v:41 = f(a), where 


om —/x if «>0, 
| Va] if «<0. 
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Show that this system has a repelling fixed point and an attracting 2-cycle. 
Is this 2-cycle globally stable? 


Ex 10.9. Let v:41 = f(a), where 


249 
F(z) = — 


Show that vz = 1/2 separates the behavior of this system in the sense 
that if zo < 1/2, there is one type of behavior, and if zo > 1/2, there is a 
different behavior. 


Ex 10.10. Consider the system 


tei = yee), 


Yt+1 = ye” t-te Ye) , 
Find a quantity that is invariant along trajectories. Discuss the convergence 
properties of this system in light of this invariant. Are there fixed points? 
Are any of these fixed points locally or globally stable? 


Ex 10.11. For the difference equation x44; = 2%, mod 1, plot w,41 as a 
function of z;. Find the fixed points in your diagram. Does the plot show 
a discontinuity? Plot x,42 as a function of x;. Can you find the oscillations 
of period 2? 


Ex 10.12. For 244; = 2x, mod 1, plot 2443 as a function of 2. How 
does this show that there are 2 distinct period 3 oscillations? Find these 
oscillations. 


Ex 10.13. A sequence is eventually periodic if there is an r such that 
Lp, Lp41,--.18 periodic of period p. Show that x4, = 2x; mod 1 has many 
eventually periodic solutions by giving a procedure that takes r and p as 
input, and outputs an initial condition that gives a solution of eventual 
period p after a run-in of r steps. 


Ex 10.14. Following our construction for period K, find an oscillation of 
period 5 for 244, = 2x, mod 1. Show that this is not the only possible 
construction by finding another solution of period 5 for this equation. 


Ex 10.15. Show that 24, = 2a; mod 1 has at most |(2 —1)/K | oscilla- 
tions of period kK. Can you find a better formula for the number of distinct 
oscillations of period K? 
Ex 10.16. Is 

T141 = 2a, mod 1 
a linear equation? Refer to the definition and notice that something is 
missing. Show that by filling in the missing part of the definition in different 


ways, you can declare the equation to be linear or you can show that it is 
nonlinear. 
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Ex 10.17. Consider the one-dimensional system 
Le41 = f(t) = 2/1 +r(a—-z)| 


for some fixed positive a,r with ar > 2. Show that the system has two fixed 
points and at least one locally stable 2-cycle when ar < V6. 


Ex 10.18. Consider the two-dimensional system 
tr41 = filze, yz) = 25-2 —y)/3, 
yeti = fo(xe, ye) = yy — 2)/3. 


Show that (2,0) is a locally stable fixed point of the system and (3/4, 5/4) 
is a fixed point that is not locally stable. 


Ex 10.19. Show that the two-dimensional system 


Ti41 = filet, ys) =z 1 +at+y)/3, 
Ye = folre, yr) =yA-—a2+y)/2, 


has four fixed points, but only one is locally stable. 


Ex 10.20. Show that 
324 +2 

41 = —— 

ne ay ad 


has two fixed points, Ag > 0 and A; < 0, and that Xo is locally stable 
but A; is repelling. Further, show that the initial value 79 = 0 gives x, = 
fan/fan—1, where f; is the i*? Fibonacci number. Give a good estimate for 
how close x, is to Ap. How does the recurrence behave for other initial 
values? 


Ex 10.21. Following the staircase method from Section 10.5.1, set up an 
iteration to calculate the fixed 1/(1—21n2) inside the interval (0,1). Also, 
show that your iteration converges to this fixed point. 


Ex 10.22. Find the invariant density for the linear fractional f(x) = 
-1 
. 7 Show by example that this invariant density is not attractive. Does 


x + 
the histogram for the trajectory starting at 0 look like the invariant density? 
Ex 10.23. Pick an aperiodic linear fractional system with complex eigen- 
values. Find the invariant density for your system. Pick an initial point 
and calculate the histogram for the trajectory starting at this initial point. 
Does the histogram look like the invariant density? 


Ex 10.24. Show that the fixed point x = 1 is globally stable for x4, = 
f(x) with 
6x 0<a<1/2, 
f(z) = 45-4 1/2<a<l, 
1 l<ga. 
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Further show that there is no linear fractional system that bounds f() as 
required by Theorem 10.6.7. 


Ex 10.25. Find a value for a such that 
f(x) = 2[1+ 2(1 — 2)] 


is bounded by a linear fractional system as in the hypotheses of Theo- 
rem 10.6.7. Use this to show that x = 1 is a globally stable fixed point 
for 

te. = 2[l4+r(1 — 2:)] 


for all r € (0, 2]. 


Appendix A 
Worked Examples 


In the first chapters of this book we consider k*'—order linear recur- 
rences, that is, equations of the form 


(L) Sn — C18n—1 — C28n_2 —*** — CeSn_p = Y(n) forn>k, 


where c; are complex scalars with c, 4 0, and w is a complex-valued func- 
tion. There we found a nice solution for the special case in which the forcing 
function has the form 
v(n) = A"p(n), 

where 4 is a fixed scalar and p is a polynomial with complex coefficients. 
The initial value problems we consider in this appendix have this form. Our 
principal tool is Theorem 3.3.1, which we use to find a particular solution 
(Un). Throughout, A1,...,A¢ are the different eigenvalues of the recurrence. 


A.1 All Simple Roots 


All examples in this section are second-order linear equations whose char- 
acteristic polynomial is 


ch(x) = x? — 2 —6 = (x — 3)(x +2), 


which has the simple roots Ay = 3, Ag = —2, and the general solution 
of the homogeneous system has the form s,, = a13" + a2(—2)” for some 
a1, a2 € C, and for any w the equation 


Sn = Sn-1 + 68n_-2 + W(n) 
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has the general solution 
Sn = 413" + a2(—2)" +n, 


for some particular solution v,. The constants a,,a2 are determined by 
the initial conditions. 
Theorem 3.3.1 can be used to find a particular solution to the nonhomo- 
geneous equation 
Sn = Sn-1 + 68n-2 + A” p(n), 


where p is a non-zero polynomial and A is a scalar. For these equations the 
following special case of the theorem applies. 


Solving (L) when there are no multiple roots 
When ch(x) has no multiple roots, then (s,) is a solution to the equation 


(A.1) Sn — C18n—1 — C28n_2 — *** — Ce8n—~ = A” p(n), 


if and only if 


k 
(A.2) Sn = Ss air, + dr” nd q(n) ; 


i=l 


where a1,...,@n € C, g(x) is a polynomial with deg(q) = deg(p) and 


(Ag) [0 PAE Da dd 
if X € {Au,.--, re} 


A particular solution is \” n° q(n). 


Let’s first analyze the homogeneous initial value problem. 
Example A.1.1. For any initial value problem with equation 
Sn = 8n-1 + 68n_2 
we have s,, = a3" + a2(—2)", and the initial conditions so, s1 give 
(A.4) 89 =a, + a2 and s, = 3a, — 2ag. 


Multiplying the first equation by 3 and subtracting that from the second 
equation, we obtain 


380 — 87 


81 — 389 = —5aq and ag = 
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Inserting this value for a2 into the first equation of (A.4), we have 


389 — 81 289 + 54 
a, = 89 -— —,_— = 


5 5 


and the solution to the homogeneous initial value problem is 


= 289 + 81 


3 _ 
a gn 4. 380= 81 


—2)”. 
Example A.1.2. Consider the second-order equation 
Sn = Sn-1 + 68n—2 +2" ’ 


which has deg(p) = 0 and \ = 2 ¢ {3,—2}, and so deg(q) = 0 and 6 = 
0. The above formula gives the particular solution v, = c2”, and the 
recurrence can be used to solve for c: 


Un 1+6Up_g+2" — uy, = 2 +6e2" * +2" — 2” = 2" "(e+ 3e4+2—2c). 
Therefore, 2c + 2 = 0, which gives the particular solution v, = —2”, and 
(A.5) Sn = 013" + ag(—2)" — 2” for some aj, a2 €C. 

We can write this in terms of the initial values so, 51, namely, 

(A.6) 89 = a, + a2 —1 and 8; = 3a, — 2a2 — 2, 


which can be solved for a1, a2 as in the previous example. Multiplying the 
first equation by 2 and adding it to the second, we have 
280 + Sy + 4 


259 + 81 = 5a; —4 and a; = 5 


Substituting this value of a; into the first equation of (A.6), we obtain 


2 4 380 — si t1 
PE Er ee oy na me a AL age es Lg 
5 5 
and (A.5) becomes 
2 4 ~si+1 
gy = OTT gn Son SE (-2" 9", 


To satisfy ourselves that this has been done correctly, let us use the last 
expression to verify that 


ic ec 


5 5 4—-4=5,+659+4, 


$2 


which is consistent with the recurrence. 
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Example A.1.3. For the second-order equation 
(A.7) Sn = Sn-1 + 68yn_-2 +2”, 


let us first check that there is no constant c for which w, = c2” is a 
particular solution. We have 


Wn—1 + 6wy—2 + 22” — wy, 
= c2"~1 + 6c2"~? + n2” — c2” 
= 2"-'(e+ 3c4 2n— 2c) = 2"(c+ n), 
which cannot be zero for all n when cis a constant. Rather, since deg(p) = 1 


and \ = 2 ¢ {3, —2}, the formula says that there exists a particular solution 
of the form v, = (an + b)2” for some constants a,b. The recurrence gives 


Un—1 + 6Un—2 +122” — Un 
= (an —a+b)2"—1 + 6(an — 2a + b)2"-? + n2” — (an + b)2” 
= 2"-1((an — a+ b) + 3(an — 2a + b) + 2n — 2(an + b)) 
= 2"-1((2a + 2)n + (—7a + 20)), 


which must equal zero for all n; that is, 2a +2 = 0 and —7a+ 2b = 0. 
This gives a = —1, 2b = 7a = —7, and vp, = —(2n + 7)2"~? is a particular 
solution of (A.7), and the general solution has the form 


Sn = 013" + ag(—2)” — (2n+7)2”~', where a1, a2 are constants in C. 
For initial values s9, 51, we obtain 
$9 = a, + a2—7/2 and s; = 3a; — 2a, —-9. 


Simultaneously solving this system of equations yields 


289 + 5s, +16 cal 6s9 — 25, +3 
ay, = ———__ and ag = ——_ 
a 5 2 10 ’ 
which gives 
2 16 659 — 2 3 
Sales £90 7 811 19 on = i mig inae 


5 5 
Verifying this calculation for n = 2 shows that 


— 1889 +9581 +144 = 12589 — 451 + 6 


82 = 5 5 —22=5,+6s)+8, 


as required. 
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Example A.1.4. For 

Sn = 8n—-1 + 68n_2 +3”, 
deg(p) = 0 and A = Ai, which means that 6 = 1, and the formula gives 
Un = 3” bn for some b € C. (Note that w, = c3” is a solution to the 
homogeneous equation and so cannot be a particular solution here. Also, 
Un = 3"(bn+a) could be used here, but the constant term can be absorbed 
into the earlier coefficient of 3".) To find b, use 


Un—1 + 60,2 +3" —v_ = 8" 1b(n—1+2n—4-3n) +3" = 3" 1(-—5b+3), 
and from this, b = 3/5 and vy = n3"*1/5. Therefore, 


antl 
Sn = a3" + ag(—2)" + ie ae 
and so = a1 + G2, 81 = 3a; — 2a2 + 9/5 gives 
2 —9/5 389 — 9/5 
q =P i sd ee a / . 


Hence, 


1 
8, = = ((250 + $1 — 9/5 + 3n)3” + (380 — Sy + 9/5)(-2)") . 


Example A.1.5. For 
8n = Sn—-1 + 65n_2 ae n(—2)” > 


deg(p) = 1 and A = Ag, from which we obtain 6 = 1 and v, = (—2)"¢(n), 
where g(n) = cn? + bn for some b,c € C. Then 


Un—1 + 6Un—2 + n(—2)” — Un 
= (—2)"~* (q(n — 1) — 8q(n — 2) — 2n + 2q(n)) 


equals zero for all integers n > 2. In particular, from n = 2 and n = 3 we 
obtain 


q(1) — 3q(0) + 2q(2) = 4, 
q(2) — 3q(1) + 2q(3) =6, 


which give 5b+ 9c = 4 and 5b+19c = 6, and c = 1/5, b = 11/25. Therefore, 
q(n) _ 5n2+11n 
25 


and 


FS 5n? + 1ln 
25 
Substituting for n = 0 and n = 1, we have 


8m = 413" + (a2 


\(-2)". 


$9 =a, + a2 and s, = 3a; — 2a — 32/25, 


2s + Ss + 32 25 38 = 81° = 32 25 


oe 5 5 
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A.2 One Multiple Root 


Solving (L) when there is a single eigenvalue 
When there is a single eigenvalue, 1, it has multiplicity k, 
and the rule for particular solutions simplifies to 
Un = A" n° q(n) is a particular solution 
where q is a polynomial with deg(q) = deg(p) and 


(A8) 5_)0 ifrX\ AM 
k ifA=A1— 


IfX # 4, the solution is 
Sn = At q(n) AY q(n) 
where deg(q) = deg(p), deg(qi) < k and the coefficients of q 


are determined from the initial conditions. 
IfX = 1, the solution is 


Sn = AT a(n) + A” nk q(n) = A” Q(n) 


and deg(Q) = k + deg(p). 


The five examples in this section are second-order equations, and each 
has the characteristic polynomial 


eh{z) = 27 —4e +4 = (¢ —2)?. 
This has the double root A; = 2, which means that the general solution is 
Sn = (ain + a2)2” + Un, 


where vy, is a particular solution. As in the last section, v, depends on w, 
and aj, a2 can be calculated from the initial conditions. 


Example A.2.1. For any initial value problem with recurrence 
Sn = ASn—1 _ Asn_2, 


Sn = (ain + a2)2", where the constants a; = (s1 — 289)/2 and ag = so can 
be computed from the initial conditions sg, s,, and the solution is 


8a = ( (81 — 285) n+ 289) 2"-*. 
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Example A.2.2. Consider the second-order equation 
sn = 48n—1 r. 48-2 + 3”, 


where deg(p) = 0 and \ = 3 is not an eigenvalue of the recurrence. From 
the formula, a particular solution is v, = c3” for some constant c, and 


Ap —1 — 4Up_2 + 3” — Up = 37-712 — 4c + 9 — 9c) = 3"-7(9 — co), 
which equals zero for c = 9. This gives v, = 3"*? and general solution 
Sn = (ayn + ag)2” + 3+? for some a1, a2 € C. 


For the initial conditions so, 51, computation gives 


Sy = 


——— 


n+ (80 9) )2" 43"42, 


Example A.2.3. The second-order equation 


Sn = 48n_1 —48n-2 +3" Nn 


has deg(p) = 1, and A = 3 is again not an eigenvalue. Therefore, there are 
constants a,b such that v, = (an + 6)3” is a particular solution. Also, 


AUn—1 — 4Un—2 + 23” — Un, 
= 3"-7(12(an — a+b) — 4(an — 2a+ 6) + 9n — 9(an+ b)) 
=3"-2((9 — a)n— (4+ 8)), 


which equals zero for all integers n > 2 when a = 9 and b = —4a = —36. 
Therefore, Un = (9n — 36)3" = (n—4)3"*? is a particular solution, and the 
general solution has the form 

8n = (ayn + a2)2” + (n — 4)3"7?, 
For initial values sg, 5; we obtain 


~2 
Sn = (ea, eee 36) )2” + (n—4)3"+2, 


2 
Example A.2.4. The second-order equation 
Sn = ASn—1 = Asn—92 + 2” 


has deg(p) = 0, and \ = 2 is an eigenvalue. Since its multiplicity is 2, 6 = 2 
holds in the formula, and v, = an?2” is a particular solution for some 
aéC. Then 
AUn—1 _ Avn—2 + 2° Un 
= 2"(2a(n — 1)? — a(n — 2)? +1 — an?) 
= 2"(-2a+1), 
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implying a = 4, which means that uv, = n 


The general solution therefore has the form 


29"—-1 is a particular solution. 


8, = (n? + ain + a,)2""1, 
and from the initial values so, 5; we obtain 


8p = (n? + (s, —1—259)n + 289)2"1. 


Example A.2.5. The second-order equation 

Sn = 48n—1 —48n-2 +2” 
has deg(p) = 1, and A = 2 is again the double eigenvalue of the recurrence, 
which means that v, = 2"q(n) is a particular solution for some q(n) = 


an? + bn?. Then 


4vUn—1 — 4Un—2 +122” — Un 
= 2"[2q(n —1)-—q(n—2)+n—- a(n], 


which must equal zero for all values of n > 2. From 


2q(n — 1) —q(n- 2) +n—q(n) =0 forn=2,3 


we obtain a = 1/6 and b = 3a = 1/2. Therefore, un = ($n? + n?)2"~1 is a 
particular solution, and the general solution has the form 


ns 2 1 
Sn = (+ +n + ayn + a2) 2" : 


Initial values so, 51 give 


3 4 
Sn = (+ Ln? 4 (s1 — 289 3)” | 259) 2"-1. 


Since from (A.8) the solution is 2” Q(n) with Q(n) a polynomial of degree 
3, the same solution could have been obtained by solving a system of four 
linear equations to find the coefficients of Q(n). To obtain these coefficients 
as functions of only so and s1, the recurrence (A.2.5) could be used twice 
to give sg and s3 in terms of so and s). 
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A.3 One Multiple Root, Several Simple Roots 


Solving (L) when there is a multiple eigenvalue 
When m, > 2 and m2 =--:- =m, = 1 are the respective multiplicities of 
the distinct eigenvalues 1, 2,..., Az of the recurrence (L), 
then Un = A"n°q(n) is a particular solution where q is a polynomial 


with deg(q) = deg(p) and 


Of AZ {Xtysney At} 
(A.9) a4 
Pe aac 


Since each of the last five examples had only one (double) root, The last 
alternative of (A.9) did not occur. The next example illustrates this case. 


Example A.3.1. The second-order equation 


Sn = 48n 1—98n 2+28n 32" 


has deg(p) = 0 and A = 2. Since the characteristic polynomial factors as 


ch(x) = 2° — 4a? + 52 —2 = (2 — 1)?(z — 2), 


A = 2 is a simple root of ch(x) and 6 = 1, which gives the particular 
solution v, = an2”. Then 


AUn—1 — 5Un—2 + 2Un—3 + 2” — Un 
= 2"-?(8a(n — 1) — 5a(n — 2) + a(n — 3) + 4 — 4an) = 2"-7(4 — a), 


and v, = 4n2” = n2”+? is a particular solution, and the general solution 
is 
Syn = (ayn + az) + a32” + n2rt2 = (ain + az) + (a3 + 4n)2” - 
For initial values so, 51, 52 we obtain 
50 = dg + a3 and 8s; = a, + a2 + 2a3 + 8; 82 = 2a, + ag + 4a3 4 32, 


which implies 


Sn = (—289 +381 — 82+8)n+ (281 —s2+16) +(so—281+82—16)2"-+n2"**. 
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A.4 The Input is yf pi(m) + 74 po(n) 


For this form of input, one can find a particular solution by taking the sum 
of particular solutions to two related equations. Of course, there is nothing 
special about 2, so if the input is asum of 7 terms, one can find a particular 
solution by taking the sum of particular solutions to 7 related equations. 


Example A.4.1. For the second-order equation 


(A.10) By = 48n—1 — 48n_-2 + 3" N24 2”, 


we break this equation into two equations, one for the input 3” n and one 
the input 2” 


(A.11) Un = 4Un—1 — 4Un-2 +3" nN, 


(A.12) Wn = W8n—1 — 4Wn—2 + 2”. 
From Example A.2.2, we have that 
Un = (n—4)3"*? 
is a particular solution to (A.11), and from Example A.3.1, 
Wn =n? 2Qr-1 
is a particular solution to (A.12). Combining these we have that 
(n—4)3"t? 4 n2Qr-t 
is a particular solution to (A.10). The general solution to (A.10) is then 
8, = (in +a@2)2” + (n—4) 8"? + ny? 27-1, 


where a; and az depend on the initial conditions s9 and s;. Solving for a; 
and a2, we find that 


a (( — 50 +4) n+ 50 + 36) 2" 4 (n—4)3"t2 4 129-2, 
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Complex Numbers 


Because the square of a real number 
cannot be negative, there is no real 
number whose square is —1, and this 
means that the simple quadratic equa- 
tion z? + 1 = 0 has no real roots. Us- 
ing notation introduced by Leonhard 
Euler in 1777, we reserve 7 to mean a 
symbol for which i? = —1 holds, and 
then define the set of complex numbers 
to be 


C={a+bi: 


There is a one-to-one correspondence 
between each element a+ bi in C and 
the point (a,b) in the set R?. In this 
bijection, real numbers a correspond 
to the points of the form (a,0), and 
imaginary numbers 0+: correspond 
to the points on the y-axis. If the op- 


a,beR}. 


In La Géometrie (1637) René 
Descartes introduced the 
terms real and imaginary 
numbers. In 1936, a US 
mathematician named Arnold 
Dresden suggested that the 
term “imaginary” be changed 
to “normal,” since the imag- 
inary axis is normal (that 


is, perpendicular) to the real 
axis. That term never caught 
on, and instead has been used 
for something different: a real 
number is called normal if the 
digits of its base b-expansion 
behave in a suitably random 


manner for every 
integer) base b. 


(positive 


erations of addition and scalar multiplication (using real scalars) on C are 


defined “coordinatewise,” namely, by 


(ay + byt) + (a2 + bot) => (ay + az) + (by + bg)t 


and 


c(a + bi) = (a + bi)c = (ca) + (cb)i, 
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the correspondence is actually a vector space isomorphism, and C is a two 
dimensional real vector space under these operations. In particular, C is an 
abelian group under addition. This means that the operation of addition 
on the set C is a commutative, associative operation for which the real 
number 0 is the additive identity and every element of C has an additive 
identity. 

How is multiplication of two complex numbers defined? If we want mul- 
tiplication to be associative and commutative and also to distribute over 
addition, then it turns out that there is only one way to define multiplica- 
tion! This is because for any a,b, c,d € R, 


(a+ bi)(c + di) = (a + bi)e + (a + bi) (di) distributive law in C 
= (a+ bi)c+ ((a + bi)d)i associative law in C 

= c(a + bi) + (d(a + bi))i commutative law in C 

(ca + cbt) + (da + dbi)i scalar multiplication in R? 

= (ac + bet) + (ad + bdi)i commutative law in R 

= (ac + bei) + (adi + bdi) distributive law in C 

= (ac + bei) + (—bd+ adi) i satisfies the identity i? = —1 

= (ac — bd) + (ad + bc)i definition of addition in C. 


This shows that because C is a real vector space, the only associative, 
commutative, and distributive multiplication on C that extends the scalar 
multiplication by elements of R is 


(a + bi)(c + di) = (ac — bd) + (ad + bc)i. 


(Also note that (a + 07)(c+ 02) = ac, 
which says that this multiplication is 
consistent with multiplication on the bers a + bi,c + di seems to 
subset R.) This definition of multipli- require the four real multi- 
cation satisfies the commutative law plications ac, bd, ad, bc. Check 
for multiplication, and the real num- that the auxiliary multiplica- 
ber 1 is the multiplicative identity. As [4ion (a +b)(c+d) can be used 
you might already know, the set C with 
the operations of addition and multi- 
plication defined above is the field of 
complex numbers. Also, a is called the real part and b the imaginary 
part of the complex number a + 01. 

Until now, our geometric representation of complex numbers has been 
in rectangular coordinates. Thinking in terms of polar coordinates, we can 
rewrite the complex number z = a+ ib as z = |z|(cos(@) + isin(@)), where 
|z| = Va? + 6? is called its modulus, and its argument 0 is the angle 
between the positive z-axis and the vector a + bi. For the moment, we 


On the surface, the multipli- 
cation of two complex num- 


to reduce the number of mul- 
tiplications to three. 
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write e(#) = cos(@) + isin(@), and then 
z = |z|e(0) for some 0 < 6 < 27. 
For instance, two values are e(7) = cos(m) = —1 and e(27) = cos(27) = 1. 
Let’s consider the function e(@) from another point of view. From calculus 


you know that the functions cos(@) and sin(@) are functions of the real 
variable 6 whose power series are 


(=) jee CD pore 
cos(@) = DR k and sin(@) = d pre iyi” RL 


which converge at every real number @. Therefore, for any real number y 
the complex number e(y) = cos(y) + isin(y) equals 


ae ae 
e(y) =) api) “+> oper) — 


k>0 k>0 

since (—1)* = i?*. If we can add these two series, then 
_y Liv)" 

(B.1) ay) = 


k>0 


which is reminiscent of the power series expansion of e” = 7,55 a* /k\. In 
fact we will show that these series can be added and it then makes sense 
to define the complex exponential function e* of the complex variable 
zZ=ax“+1y as 


(B.2) e* = e*(cos(y) + isin(y)). 


Once this is done, we can drop the notation e(#) and write z = |z|e’”, where 
@ is the argument of z. 

We've gotten ahead of ourselves here, because we haven’t yet defined 
what we mean by convergence of power series with complex coefficients. As 
with the reals, for any sequence (a;) of complex numbers and any complex 
number a the power series }>,..) ax(z—a@)* is said to converge at the 


complex number z = 2p if the sequence of partial sums ee az(z — a)* 
converges, where complex modulus is used for the absolute value. We will 
show that every power series has a radius of convergence R (it might be 
infinite), which has the property that the series }>,.9 |ax(z —a)*| con- 
verges for all |z — a| < R and diverges for all |z — al > R. Because 
a” = se ax* /k! converges for all real numbers, this complex power series 
cannot have a finite radius of convergence. This means that e* is absolutely 
convergent for all complex numbers z and so can be rearranged, and e* as 
defined in (B.2) is a well-defined function on C. 
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Remember that we want to prove that each power series }>,5.9 ax(z — a)* 
has a radius of convergence. We give a proof of this fact by showing that 
the supremum of the set 


S={r>0: there exists M > 0 such that|a;|r* < M for every k} 


is the radius of convergence. To do this we show that for R = sup(S) 
the series 7,5, |ax(z — a)*| converges for all |z — a| < R and the series 
p> Ge(z — a)* diverges for all |z — a] > R. If the complex number z 
satisfies |z — a| =r > R, then r is not in S, which means that |a,||z — a|* 
is unbounded and the series must diverge. On the other hand, for r = 
|z —a| < R we can choose p such that r < p< R. Then p is an element of 
S, and there exists M > 0 such that |a;,”| < M for every k. This gives 


r k r k 
Jax(z — a)*| = Jaglr® = laxle*(5) < (J) 


where the ratio of the bounding geometric series )>,.,(r/p)* satisfies 
0<r/p <1. Therefore, by comparison, the series }7,5.9 ax(z — a)" is ab- 
solutely convergent on |z — a| < R, and R = sup(S) is the radius of 
convergence of the power series. 

Any power series )7,55 @k(2 — a)’ whose radius of convergence R is 
positive can be shown to be a differentiable function on its disk of con- 
vergence. Also, its derivative is the power series ¥ = 7,59 kax(z—a)*}, 
and y has the same radius of convergence as the original series. To see this, 
let R, be the radius of convergence of the series y, and R, < R follows 
from |kax| > |ax|. To show that Ri > R, we'll prove that R, > r for all 
0<r< R. The construction of R implies that for any p with r<p<R 


there exists M > 0 such that |a,|p* < M for all k, and 


Meag|r®—} = Heel (2)" <M A(2)". 
r p r p 
Since 0 < r < p, then limp... k(r/p)* = 0 and, |kaz|r*-! is bounded, 
giving R,; > r for allO < r < R. From this we see that R; > R, as 
required. 

To summarize, if we let f(z) = )>,s9 ax(z—a)* for any power series 
ps0 ak(z — @)* then, f(z) is a differentiable complex-valued function on 
its disk of convergence. Further, its derivative is f’(z) = )>,s9 D(ax(z — a)*), 
where D is the differentiation operator on the space of polynomials. Re- 
peating this argument, f’(z) must also be differentiable on the same disk, 
and f”"(z) = ops9 D?(ax(z — a)*). From this we see that f(z) is infinitely 
differentiable on its disk of convergence. 

Before leaving power series, we define the Taylor series of f about z = a 
as 

k 
oo DD(@) ( z—a)*, 


k! 
k>0 
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where D*(f)(a) is the k*® derivative of f(z) evaluated at z = a. It can be 
proved that the Taylor series of f converges to f(z) on some disk about a, 
provided >>...) 4n(z — a)” has a non-zero radius of convergence. 

Returning to the complex exponential function, from the formulas for 
the sum of two angles we have 


e'(1+%2) — [eos(A,) cos(82)—sin(61) sin(42)]+i[sin(01) cos(@2)-+sin(2) cos(0)) . 


The definition of multiplication allows us to unwrap this identity to get 


e'(91 +42) = e191 eiG2 


3 


and iteration of this process for any positive integer m gives 


ein ~ cio +i(m—1)0 = e% ei(m—1)0 = (en? ; 


This proves the laws of exponents 
ez1tz2 = e71e”2 and em = (e\” . 

When 69 = 27/m, this formula becomes e’%™ = e!?™ = 1, which means 
that z = e’ is a root of the equation z™ — 1 = 0, which is called the 
principal m™ root of unity. (The complex numbers satisfying 2” —1 = 0 
are called the m* roots of unity.) Writing the equation 2” —1 = 0 in the 
form z™ = 1, we see that every power of the principal m* root of unity is 
also an m* root of unity, and since e’? = e*(°+?™) always holds, this gives 
m different roots e””, where 


Since these roots are complex numbers that are equally separated on the 
unit circle |z| = 1, they are often called cyclotomic (or “circle dividing” ). 
Because a polynomial equation of degree m has at most m roots in any 
field (refer to Exercise 8.23), these are all the roots of z”™ —1 = 0 in 
C. Geometrically, they form the vertices of a regular m-gon inscribed in 
the complex unit circle with one vertex anchored at 1. Algebraically, they 
form a group G,, under multiplication, and this group is isomorphic to the 
integers modulo m under the map 


eik2t/m. . kb modm. 
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Among these roots are the primitive 
m** roots of unity, the ones that are 
not roots of z* —1 = 0 for any 0 < 
k < m. (The principal root of unity 


The complex exponential func- 
tion gives one of the most re- 
markable formulas in all of 
mathematics: 


is an example of a primitive root of 
unity.) Every primitive root of unity 
generates the group Gm, in the sense 
that the powers 


of caer 


This formula, sometimes called 
Euler’s formula, relates four 
of the fundamental mathemat- 
ical constants, namely: e, the 
base of the natural logarithms; 
am the ratio of the circumfer- 
ence of a circle to its diameter; 
i the complex unit; and —1 the 
basic negative number. 


of any primitive root ¢ are all differ- 
ent, and so every m* root of unity 
can be written as a power of ¢. For 
instance, every fifth root of unity ex- 
cept 1 is primitive, whereas there are 
only two primitive sixth roots of unity, 
the principal root e’2"/° and its multi- 
plicative inverse e~‘?/°. The roots of unity occur naturally in many con- 
texts throughout this book. 

The fact that every polynomial of the form f(z) = z™ — 1 has m com- 
plex roots is not an anomaly, since every polynomial of degree m has m 
complex roots (where here roots are counted according to their multiplic- 
ity). This statement is usually referred to as the Fundamental Theorem 
of Algebra and is an example of a statement that is very easy to state 
but quite difficult to prove. Although a version of this result was conjec- 
tured sometime in the sixteenth century, Newton was probably the first 
to state the Fundamental Theorem in fairly modern terms when he wrote 
that every non-constant polynomial with real coefficients can be factored 
into a product of linear and quadratic polynomials with real coefficients. 
The theorem was finally proved by Gauss in his 1799 dissertation, and over 
his lifetime Gauss published at least three other proofs of the Fundamen- 
tal Theorem. Each proof is an existence proof, since it proves only that 
such roots exist and doesn’t give a method for explicitly constructing any 
of the roots. The roots of most polynomials are difficult or impossible to 
determine explicitly, but we’ve ignored this important practical problem 
by choosing polynomials with relatively obvious factorizations. 


Appendix C 
Highlights of Linear Algebra 


This appendix contains a review of the linear algebra we use in this book. 
It is essentially a synopsis of an introductory linear algebra course with a 
number of topics omitted and some new ones added. 


C.1 Vector Spaces and Subspaces 


Every pair of elements in a vector space V can be added together, and also 
each element can be multiplied by scalars. For us, the scalars are either real 
or complex numbers, and V is called a real vector space or a complex vector 
space according to whether R or C is used. (In general, the set of scalars 
must be a field as defined in Section 8.2.) In order for V to be a vector 
space, the operations of addition and scalar multiplication must satisfy 
a list of axioms requiring that both operations are associative, addition 
is commutative, and scalar multiplication distributes over addition. Also, 
there must be an additive identity and every element of V must have an 
additive inverse in V. The basic examples of vector spaces are R” and C” 
(with scalars from R and C respectively). 

A nonempty subset W of V is called a subspace if W is also a vector 
space under the operations of V. Given a set S of vectors, we can form 
linear combinations of elements v,,...,v, from S', sums of the form 


Q1V, +-+++QnUn, where the a; are scalars. 


Every subspace is closed under formation of linear combinations, and it can 
be proved that any nonempty subset of a vector space that is closed under 
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addition and scaling is a subspace. The subspace of all linear combinations 
of elements from S is called the span of S, which we will denote by Span($); 
it is the smallest subspace containing S. A set S is called a spanning 
set for V when Span(.S) = V, which means that every element of V can 
be written as a linear combination of elements from S. 

Although we will principally concentrate on subspaces of R” and C”, 
there is a more general class of vector spaces that we encounter, the vec- 
tor spaces of real-valued or complex-valued functions defined on some set 
X. Because a sequence is a function defined on the natural numbers, the 
complex vector space of all sequences of complex numbers is an example 
of such a vector space. Both addition and scalar multiplication are defined 
“componentwise”. In other words, to add two sequences we add the n*® 
term of the first sequence to the n*® term of the second, and multiplying 
every element of a sequence by the scalar a gives the scaled sequence. Con- 
sidering an infinite sequence (s,,) as the function s(n) = s, defined on N, 
these componentwise vector space operations are the usual operations for 
functions, 


(s+t)(n) = s(n)+t(n) and (as)(n) =as(n). 


As we said above, in general, the set of all functions from any set X into R 
or C forms a vector space under the usual operations of function addition 
and scaling. When X = Z we obtain the set of all doubly infinite sequences 
of real or complex numbers. 


C.2 Linear Independence and Basis 


How many different linear combinations yield the same vector? A nonempty 
set S of vectors is called linearly independent if every element in Span(S$) 
can be expressed as a linear combination of elements of S in only one way. 
Since Span(S) is a subspace of V, it contains the zero vector, and so linear 
independence means that whenever 


QV, ++++ + AnUyn = 0 = Byv, + +++ + BmUm 


holds we have a; = (3; for all 7. Since the choice of all @; = 0 gives the zero 
vector, linear independence means that 


ayvy tes + ann =O — > a; = 0 for all i. 


The last condition is often taken as the definition of linear independence. 
A nonempty subset 6 C V is called a basis for V if it is a linearly 
independent spanning set for V. In other words, B is a basis exactly when 
every vector can be expressed as a linear combination of elements from B 
in a unique way. When V has a basis with n elements, we can encode every 
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element of V as an n-tuple. For instance, it can be checked that the set 
B = {(1,2),(—1,1)} is a basis for V = R?. Because the vector v = (—2,3) 


satisfies i 7 
= ~(1,2)+ -(-1,1 
v= 5(1,2)+5(-1,0), 


its encoding in terms of the basis B is 


“(i 
7/3) ° 

In general, when B = {v,...,Un} is a basis for V and v = a101+-+-+QnUn, 
we write the coordinate vector of v relative to B as vg = (a4,..., Qn)" 

The dimension of a vector space is defined to be the number of elements 
in a basis. This definition makes sense because any two bases for the same 
vector space must have the same number of elements. The proof of this 
fact can be broken into two cases, whether or not the vector space has 
a finite spanning set. In the first case the vector space is called finite- 
dimensional. The proof for this case proceeds by proving that if V has 
a spanning set with n elements, then any subset that has more than n 
elements must be linearly dependent. From this we know that any basis 
can have at most n elements. If B and B’ are two bases for V, then B is a 
spanning set, and the linear independence of B’ implies that the number of 
elements in B’ is bounded above by the number of elements in 6. Reversing 
the roles of B and B’ yields the desired result, namely, that 6 and B’ have 
the same size. If there is no finite spanning set for V, the vector space V 
is called infinite-dimensional. The above argument is too simplistic for 
this case, but there is a more sophisticated argument that works. We don’t 
give that argument here. 

When V is a finite-dimensional vector space, any subspace W is also 
finite-dimensional, and the dimension of any proper subspace W # V is 
strictly less than the dimension of V. 


C.3 Linear Transformations 


A linear transformation is a map between two vector spaces (they must have 
the same field of scalars) that preserves addition and scalar multiplication. 
In other words, a function T: V — W is a linear transformation if 


T(av+v') =aT(v) + T(v’) for all v,v' € V and all scalars a. 


Because of linearity, a linear transformation is defined by its action on a 
basis. 

Linear transformations that are functions from a vector space to itself are 
called operators. We can represent a linear operator on an n-dimensional 
vector space V with respect to a basis B = {bi,...,b,} by the n x n 
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matrix whose i‘* column is T(b;) as represented in the basis B. We denote 
this matrix by [T]g. For example, if T is the operator on R® defined by 
T(a,y,z) = (8a 4+ 2y,z,—-z) and B= {(1,0,0), (1, —-1,0),(0,0,1)}, then 
T(b1) _ (3, 0,0) = 3b ; T(b2) = (1,0, 0) = bi; T (bs) = (0,1,=1) = 
by = bo = bs, and 


3 «1 1 
fie=| OO: 41 
00 -1 


Notice that we are assuming an inherent order to the elements in the basis, 
and the term ordered basis is normally used to emphasize this. For A = 
[T] we have the helpful identity 


T(v)g = Avg for allu eV, 


and T is an invertible operator iff A is an invertible matrix.! From this 
it follows that if we are given two pieces of data, an ordered basis 6B for 
an n-dimensional vector space and a linear operator 7’, then from this 
information we can obtain an n x n matrix A with the property that a 
coordinate vector for T(v) relative to B is the result of multiplying A by the 
coordinate vector for v relative to B. If C = {ci,...,¢,} is another ordered 
basis for V, consider the matrix P whose i‘* column is ¢; represented in 
the basis 6. This matrix P is often called the change of basis matrix 
from C to B. Since the associated linear operator (the identity operator) is 
invertible, the matrix P is invertible and 


T(v)e = P'APie, 


and the representation of T in the basis C is the matrix P~!AP. Two 
matrices A and B, that are related by a matrix equation of the form 


B=P7'AP 


are called similar matrices, and they represent the same linear operator 
with respect to two different bases. 


C.4 Eigenvectors 


A non-zero vector v € V is called an eigenvector of the linear operator 
T if T(v) = Av, and the scalar A is called the associated eigenvalue. Sets 
of eigenvectors corresponding to different eigenvalues are linearly indepen- 
dent. 

When V is finite-dimensional and A = [T]z, the matrix of T relative to 
any convenient basis 6, then any eigenvector v with associated eigenvalue 


ln this book, “iff” is shorthand for “if and only if”. 
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A satisfies (A — AT)vg = 0. Since v is non-zero (and so vg is non-zero), this 
implies that (A — AZ) is a singular matrix and det(A — AJ) is zero. From 
this we see that the eigenvalues of T (which are also called the eigenvalues 
of the matrix A) are the roots of the polynomial ch 4(x) = det(A— J), the 
characteristic polynomial of A. Finding the roots of any polynomial is a 
difficult problem, but our examples have been constructed so that roots of 
the characteristic polynomials are relatively easy to find. Then finding the 
eigenvectors associated with an eigenvalue \ amounts to using Gaussian 
elimination to solve the homogeneous system of equations (A — AI)v = 0. 

When the vector space has a (finite) basis 6; of eigenvectors, the matrix 
representing T in the basis B, is a diagonal matrix (and the diagonal ele- 
ments are the respective eigenvalues). If A is again the matrix of T relative 
to any basis 6 and P is the change of basis matrix from 6; to B then 
P~'AP is a diagonal matrix and A is called diagonalizable. We summa- 
rize this very nice situation in the following theorem whose proof can be 
found in [78, Chapter 6]. 


Theorem C.4.1. Let A be an n x n matrix with complex entries and 
characteristic polynomial ch4(«) = det(A — Ix). Let 1, X2,.--,An be 
the eigenvalues of A, the complex roots of cha(x) = 0 repeated according to 
multiplicity. Then: 

(a) The matrix A is diagonalizable iff V has a basis of eigenvectors. 

(b) If all the A; are distinct then A is diagonalizable. 

(c) Suppose A is diagonalizable and {v1,...,Un} is a basis of eigenvec- 
tors. If P is then x n matric whose i* column is v; then P~!AP is 
the diagonal matrix D with diagonal entries r1,...,An- 

(d) Real symmetric matrices are always diagonalizable. 


For us the most useful application of diagonalizability is that it al- 
lows the powers of A to be computed quickly, since P~'AP = D yields 
AY = PD" P—* forall n> 0, 

Not all square matrices are diagonalizable, but every square matrix A 
with complex entries has a Jordan form. This means that there exists an 
invertible matrix P with the property that P~'AP is composed of Jordan 
blocks on the diagonal, where a Jordan block is a bidiagonal matrix of the 
form 


Xi 1 0 gen 0 
0 A %1 O... O 

for some j. 
0 O 1 


o 
oOo 
o 
o 
> 
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For example, the matrix on the right is in Jordan 3 4 
form. It has three Jordan blocks: one of size 2 with 0 3 


0 
0 
A, = 3, one of size 1 with Ag = 3, and one of size 

1 with A3 = 2. The Jordan form of A is unique up 0 0 
to the order of the individual Jordan blocks, the 00 0 
d’s appearing in the blocks of the Jordan form are 

the eigenvalues of A, and the minimal and characteristic polynomials of A 
can be found from J. The proofs of all these facts and more information on 
Jordan form can be found in [78, Section 7.3]. Since A” = PJ" P~', powers 
of A are relatively easy to compute because the n‘® power of a Jordan block 
is 


(C.1) 
js 1 9 a Oy" Sr et 
0 1 0... 0 A «tie fe 
0 0 1 . <6 
¢ ce oo d aa 50 


C.5 Characteristic and Minimal Polynomials 


Recall that the characteristic polynomial of an n x n matrix A is defined 
as cha(x) = det(A — Ix). For instance, the characteristic polynomial of 
0 


A= | 1 ca | is cha(x) = x? + 1, and we note that 


cha(A) = + T= | 4 J] +fo }=(5 il 


This property holds in general. 

Before stating the theorem, let us make sure that the computation makes 
sense. Addition of matrices, like addition of vectors is componentwise, that 
is, if A, B, and C are n x n matrices and C = A+ B, the the (i, 7)** 
entry in C is the sum of the (i,7)** entry from A and he (i, 7)*" entry 
from B. Scalar multiplication is also componentwise. So if c is a complex 
number and A is an n x n matrix, then c A is an n x n matrix whose (i, j)*® 
entry is c times the (i, 7)*® entry of A. The product of two n x n matrices 
is the n X n matrix which represents the linear transformation obtained 
by applying one matrix and then the other. Specifically, if C = Ax B, 
then ¢;,; = So¢_, Gin be,j. In the special case when we are computing 
powers of A, order doesn’t matter. Because taking powers, multiplying 
by a constant (scalar), and addition, all make sense for n x n matrices, 
evaluating a polynomial at a matrix makes sense. 
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Theorem C.5.1 (The Cayley—Hamilton Theorem ). For any nxn 
complex matrix A, let cha(x) be A’s characteristic polynomial, then 


cha(A) = 0, 


that is, evaluating A’s characteristic polynomial at A results in then x n 
matrix which has every entry equal to zero. 


There is another important polynomial associated with a square matrix 
A. From the Cayley-Hamilton Theorem we know that ch,4(a) is a non- 
zero polynomial p(x) for which p(A) is the zero matrix. The minimal 
polynomial min ,(z) of A is defined to be the non-zero polynomial of least 
degree (with leading coefficient equal to 1) such that p(A) = 0. Dividing 
cha(x) by min,(x) gives polynomials g(x), r(x) such that 


cha(x) = q(x) ming(x) + r(x) where deg(r(x)) < deg(mina(z)) . 


Since ch4(A) = 0 = min,(A), then also r(A) = 0 and the minimality of 
deg(min 4) forces the remainder r(x) to be the zero polynomial. This proves 
the useful fact that the minimal polynomial always divides the character- 
istic polynomial. 


C.6 Exercises 


Ex C.1. Use induction to verify formula (C.1) for the powers of Jordan 
blocks. 


Ex C.2. Show that a companion matrix is diagonalizable if and only if it 
has distinct eigenvalues. 


Ex C.3. Prove that the Jordan form of a companion matrix A contains 
exactly one Jordan block for each eigenvalue of A. 


Ex C.4. If A is the companion matrix of a polynomial f(x) € C[a] show 
that f(a) is the characteristic polynomial of A. 


Ex C.5. Suppose A is an k x k real matrix with k distinct real eigenvalues. 
Then there is a complex matrix P such that P~!AP is diagonal. Can you 
always find a real matrix P with this property? 


Ex C.6. Consider computing the characteristic polynomial of a k x k 
matrix A. Then A — Iz is a matrix whose off-diagonal entries are complex 
constants, and each diagonal matrix is a monic linear polynomial. 

(a) Let M be ak x k matrix which has k — m rows in which every entry 
is a constant and in the remaining m rows all but one entry is a constant 
and that entry is a monic linear polynomial. If no column contains two 
polynomial entries, show that det(/) is a polynomial of degree m. 

(b) Construct an inductive argument to show the degree of the character- 
istic polynomial of a k x k matrix A is always k. 


Appendix D 
Roots in the Unit Circle 


Our analysis of difference equations shows that the general solution of a 
recurrence converges to zero when all roots of the characteristic polynomial 
are less than 1 in absolute value. Since some roots of the characteristic 
polynomial might be nonreal, this condition means that all roots lie within 
the unit circle in the complex plane. In what follows we describe a method 
of Morris Marden [105, Chapter X] for calculating the number of roots of 
a polynomial within the unit circle. Here we specialize his technique to 
polynomials with real coefficients and recast it as an algorithm. 

Before we discuss the general method, we would like to consider the 
special example of nonnegative polynomials, which is the leading case for 
applications. Recall that a nonnegative polynomial is a polynomial p of 
the form 


k k-2 


p(x) = «* — eya*-! — egak-? — ...-—cy_ ie — ce, 


where all c; > 0 and c, > 0. In Section 5.1 we prove some special properties 
of nonnegative polynomials. Among these, Theorem 5.1.3 says that any 
nonnegative polynomial has exactly one positive real root Ao, which is 
dominant in the sense that any other root X satisfies |A] < |Ao|. Further, 
from Corollary 5.1.5, when p is primitive (that is, when gcd{i | c¢; > 
0} = 1), Xo is strictly dominant, the only root whose modulus has the 
maximum value. This result can be applied to construct polynomials whose 
roots lie inside the unit circle. For instance, if g(x) is a real polynomial for 
which 


p(x) = (a — 1)q(x) = a® — eya*—} — con®? —... — cy_1 2 — Ce 
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is nonnegative, then Ay = 1 must be the dominant root of p. When p 
is a primitive polynomial, then Aj = 1 is a strictly dominant root of p, 
and all roots of q therefore lie inside the unit circle. As an example, for 
g(z) = 2? + $24 4, p(x) = (a — 1)q(z) = 2? — $a? — § is a primitive 
nonnegative polynomial. Therefore, all roots of q lie inside the unit circle, 
and the general solution of the associated recurrence s, = $8n—1 + 48n—3 
converges to zero. Of course, in this simple example it’s easy to check that 
q has two complex roots and that the absolute value of each is 1/./2. 

A more complicated example is the generalized Fibonacci polyno- 
mial , p(x) = 2* — xk} x —1 for integer k > 2. The dominant root 
Ao lies in the open interval (1,2) because p(1) < —1 <0 < p(2). Dividing 


p(x) by (a — Xo) gives the polynomial 


q(a) = «® 1 + (Ap — 1)aP-7 +... 4 Oo peed Ns 


Setting 


P(x) = (a — 1)q(a) = 2 + (Ao — 2)a*—! + (2 — 209) a*-? 
feet (AR-t — O48 -*)\a — (ART... -1), 


from p(\o) = 0 we obtain Ao(AK-+ — AK“? — --- — 1) = 1, and so the 
constant term in P is —rp', which is negative. Since 1 < Ap < 2, the other 
coefficients are also negative, and we see that P is a primitive nonnegative 
polynomial, which implies that all roots of g must lie inside the unit circle. 
Since p(x) = (a— Ao)q(x), from this we obtain that all roots of p except Ao 
are within the unit circle. 


D.1 Marden’s Method 


We now turn to Marden’s method as applied to any polynomial f with real 
coefficients. The basic idea involves a kind of pairing between 


f(x) = a9 +aya+---+a,2” and its 


reciprocal polynomial f"(x) = agx” + ayx""!+---+an, 


The relationship between f and f” is such that for every non-zero root rT 
of f inside the unit circle there’s a corresponding root r~! of f#(a) outside 
the unit circle, and conversely. This is true because 


n—-k n—-k 


f(x) =a-a* T] (e —r4) implies f*(x) = A- T] (2-—), 


a 
jel i=l Yi 


-k 
where A=a- [Jy ri. 
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For any f, Marden constructs the associated polynomial 


G(x) =a0 f(x) — an f*(x) 


=ao(ao + aia +++: + Gnx”™) — an(agz” + aye" 1 +--+ an), 


whose degree is at most n—1 and whose constant term is aj — a2. He then 
streamlines some results of A. Cohn [27] to obtain his Lemma 42.1, which 
we reword as the following theorem. 


Theorem D.1.1. Suppose f(x) = ao +aiu+---+a,2" has p roots inside 
the unit circle. If 5(f) := a%—a? is non-zero then G(x) = aof(x)—anf®(z) 
has either p or n—p roots inside the unit circle, and the sign of its constant 
term 6(f) determines which holds. Namely, G has p roots inside the unit 
circle iff 6(f) > 0. Further, f and G have the same number of roots on the 


unit circle. 


Therefore, the number of roots of the constructed polynomial G that lie 
inside the unit circle is the same as the number of roots of either f or f® 
within the unit circle, and the sign of 6(f) indicates which holds. For our 
current purposes we’ll say that G is equivalent to f when G and f have 
the same number of roots within the unit circle. (Otherwise, G is equivalent 
to f®.) 

For example, let f(x) = 2? — $a — 3. Then f has two roots inside the 
unit circle, and its reciprocal polynomial f"(x) = 1 — $2 — $2? has no 
roots inside the unit circle. The constructed polynomial is 


and 6(f) = —$ <0, which means that G is equivalent to f”. Since the only 

root of G is x = 2, which lies outside the unit circle, G is indeed equivalent 

to f®. Neatly enough, since 6(G) ¥ 0, in order to determine whether G has 

a root inside the unit circle we could have applied the same calculation to 
4 


G. For this, we calculate G"(x) = 4 — $x and construct 


-gote)- sore (5)'-(5) =5(3) 


a constant polynomial that has 6(G) > 0 and so is equivalent to G. There- 
fore, G (and also f”) has no roots inside the unit circle. 

The iteration used in this last example is the basis for Marden’s algorithm 
for counting the number of roots inside the unit circle. At the start of 
the procedure we set fo = f, which may have up to deg(f) = n roots 
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within the unit circle. The degree of the constructed polynomial f;(x) = 
ao fo(x) — anfé@(x) is at most n — 1. Provided 6(f1) # 0, we can continue 
to use f; to construct f2, and so forth. 

Notice how this technique can be used to count the number of roots. 
When the procedure starts we have f(x), which may have up to n roots 
within the unit circle. The constructed polynomial ag f — an f* has at most 
n—1 roots. If the constructed polynomial f; has the same number of roots 
as f® within the unit circle, then f” can have at most n— 1 roots within 
the circle, and hence must have at least one root outside the circle. But this 
root of f” that is outside the circle must correspond to a root of f that is 
inside the circle. So, when f, is equivalent to f”, we can increase the count 
of f’s roots inside the circle by 1. On the other hand, if f; agrees with f, 
we know that f has at most n —1 roots within the unit circle, and we can 
proceed with counting the roots of f; to get the count of the number of 
roots of f. One does not change the count in this situation. 

As mentioned earlier, the sign of the constant term of f; indicates whether 
it is equivalent to f or equivalent to f®. So the algorithm will have a switch, 
called DELTA in the version below, to keep track of whether the current 
polynomial is equivalent to f or f®. This switch will be updated when 
a new polynomial is constructed and will also tell the algorithm when to 
increment the count. 

The algorithm breaks down when some 6(f;) is zero. In this situation 
we want to obtain an equivalent polynomial EQUIV(f;) [105, Section 45). 
Suppose we encounter 


g(x) = f(x) = ap + aye + +++ +anz™ with 6(g) =a, — a2, =0. 


Then a),/a9 = +1 which we call u. Construction of the equivalent polyno- 
mial EQUIV(g) depends on whether or not (@m,-++ ,@0) = (uao,-++ ,UAm). 


Case 1. If (am,+++ ,@0) = (wao,+-++ , Um), replace g with 


EQUIV(g) = aia"~* + aga? + +--+ (m— 1)am—1¢ + Mam , 


the reciprocal polynomial of the derivative of g. 


Case 2. If (a@m,--++ a0) # (uaog,-++: ,Uam), let 0 < gq < m be the smallest 
subscript such that wag 4 Gm—q. Set b= (@m—q — Udq)/Am and 


m+q 
G(x) = (x4 + 2b/|b|) g(x) = » Bia" 
i=0 
with 
EQUIV(g) = BoG(a) — Bm+qG* (2), 


a polynomial whose degree can be shown to be at most deg(g). 
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Marden’s Algorithm 
INPUT: f(z)=ao+aiz+---+4n2", 
a polynomial with real coefficients. 
OUTPUT: COUNT, the number of roots of f 
within the unit circle. 


FOR s=0 TO n 
a) = as 
ENDFOR 
DELTA = 1; COUNT=0 


FOR j=0 TO n-1 
IF |a| = |a 


THEN EquIVGa? +--+ a® 2A) 


n— 


and update 7 if necessary. 


FOR k=0 TO n—j-1 
gor =a _ 


( garg 
ENDFOR 


n—j~n—j—k 


DELTA = DELTA x sgn(a*") 
IF DELTA <0 THEN COUNT = COUNT +1 
ENDFOR 


OUTPUT (COUNT) 


As written, the algorithm uses O(n”) time, and it uses O(n?) space for 
storing a two-dimensional array to hold the various a’s. By slightly chang- 
ing the order in which the a’s are calculated and writing over the old a’s 
with the new a’s, the space usage can be cut to O(n). The O(n?) run time 
assumes that products and differences can be computed in constant time. 
This may not be true. If one uses “real” computer arithmetic, then there 
may be catastrophic loss of accuracy, and extended precision may be neces- 
sary. Conversely, if rational or even integer arithmetic is used, the number 
of digits needed could double at each iteration, again forcing the use of 
extended precision. 


Example D.1.1. Use Marden’s algorithm to count the number of roots of 
f(x) = —7a? + x? + 5x +1 inside the unit circle. 
n=3 DELTA=1; COUNT =0 
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j=0 
k=0 af? =aal — aa =1.1-(-7)-(-7) =-48 
k=1 af? = aa — aa =1-5-(-7)-1=12 
5 = 36 


k=2 al) = a) a) - ag) =1-1-(-7)- 
DELTA =1+(-1)=-1 
COUNT =041=1 
j=l 
b=0 a? =a a — aM al? = (—48) - (—48) — (36) - (36) = 1008 
k=1 a aaa — Mal? = —48- 12 — 36-12 = —1008 
DELTA = -1-(1) =-1 
COUNT =141=2 
j=2 
At this point, Ja?) = Ja?) |, and EQUIV(f2) must be invoked. Continuing 
with our algorithm, Case 1 applies and gives EQUIV(1008 — 10082) = 
—1008. Updating to 7 = 3, the FOR loop terminates, and the algorithm 
outputs COUNT = 2 as the number of roots of —7x° + x? + 52+ 1 inside 


the unit circle. Notice that the root x = 1 is not inside the unit circle and 
is not counted. 


Example D.1.2. Find the number of roots of f(z) = 2? +2+ 1 that lie 
inside the unit circle. 
n=2 DELTA=1; COUNT =0 


j=0 
Since a”) = a), EQUIV(f) must be found. Case 1 again applies 
and 
EQUIV(a? +2+1) = a2+2a2=2+2. 
Since the degree of this polynomial is deg(f) — 1, 7 must be in- 
creased by 1 and a) =2, as) =u, 
j=l 


b=0 a?) =a Mal — a(Mal? =4-1=3>50. 


DELTA remains positive and COUNT is not incremented. The FOR loop 
terminates, and 0 is returned as the number of roots inside the unit circle 
for the polynomial x? + x + 1. This is correct, because the two roots of 
this polynomial are the primitive third roots of unity, which lie on the unit 
circle. 
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Example D.1.3. We consider f(x) = x? — x — 1, the characteristic polyno- 
mial for the Fibonacci sequence. As we’ve noted many times, f has one root 
inside the unit circle and one root outside the unit circle. Applying Mar- 
den’s algorithm, we obtain al) =-—1land a) = 1, giving u = ag/ap = — 1. 
But (ao, @1,@2) = (—1,—1,-1) 4 —(a2,a1,a0), and Case 2 with gq = 1 
applies. Then b = —2, 


G(x) = (2 — 2)9(a) = (2 — 2)(a* — 2-1) = 2° — 32° +2+2, 
and 
EQUIV(g) = 2G(z) — G®(ax) = —72? +52+3. 


Since deg(EQUIV) = deg(f), 7 = 0 is unchanged. 


j=0 
k=0 af =alal — ala — 9-49 =-40 <0 
k=1 al? = aa — a a = 15 — (—35) = 50 
DELTA = 1-(-1) =-1 
COUNT =0+1=1 

j=l 


k=0 a? =af%a — al? al) = (—40)(—40) — (50)(50) < 0 
DELTA = —1-(-1) =1 
COUNT = 1. 


The FOR loop terminates and the algorithm correctly outputs COUNT = 1 
for the number of roots of 2? — x — 1 inside the unit circle. 


Example D.1.4. Asa final example we consider f(x) = x?—1, which has the 
roots +1. Applying Marden’s algorithm, we obtain 6(f) = 0, which means 
that EQUIV(f) must be invoked. Since (a2, a1, a9) = —(ao, a1, @2), Case 1 
of the replacement algorithm applies, and EQUIV(f) = aia + 2a, = —2. 
This results in a decrement of the degree by 2, and 7 must be incremented by 
2. But then the FOR loop terminates, and the algorithm correctly outputs 
COUNT = 0. 


D.2 Exercises 


Ex D.1. Use Marden’s algorithm to show that 52” — 1 has n roots inside 
the unit circle. 
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Ex D.2. Show that 2"~!+2z2"~? + 32"-3+---+(n—1)2+7n has no roots 
inside the unit circle by applying Marden’s algorithm to x” + a”~!+--++ 
c+. 


Ex D.3. Use Marden’s algorithm twice to determine the number of roots 
inside the unit circle and the number of roots inside the circle of radius 2 
for the polynomial 2° — 32? + 2a + 2. (The actual roots are —$, 1+ 4, 
1-i.) 
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322, 351 


primitive, 87, 352 
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depth-first, 202 
searching in a graph, 193 
sensitive dependence, 303, 324 
sequence, 2 
Cauchy, 63 
Fibonacci, 71, 174, 335 
modular Fibonacci, 249 
periodic, 88 
series 
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